Original Research
Predicting pharmacotherapeutic outcomes for type 2 diabetes: An evaluation of three approaches to leveraging electronic health record data from multiple sources

https://doi.org/10.1016/j.jbi.2022.104001Get rights and content
Under a Creative Commons license
open access

Highlights

  • Three methods for building predictive models from multicenter electronic health record data are compared for clinical decision support in type 2 diabetes mellitus pharmacotherapy.

  • Two of the three approaches, Selecting Better and Weighted Average, allowed the source data to remain within institutional boundaries by using pre-built prediction models; the third, Combining Data, aggregated raw patient data into a single dataset before the model was built.

  • The Weighted Average and Combining Data approaches outperformed single institutional models with regard to prediction performance.

  • Combining Data achieved the broadest coverage of treatment patterns.

Abstract

Electronic health record (EHR) data are increasingly used to develop prediction models to support clinical care, including the care of patients with common chronic conditions. A key challenge for individual healthcare systems in developing such models is that they may not be able to achieve the desired degree of robustness using only their own data. A potential solution—combining data from multiple sources—faces barriers such as the need for data normalization and concerns about sharing patient information across institutions. To address these challenges, we evaluated three alternative approaches to using EHR data from multiple healthcare systems in predicting the outcome of pharmacotherapy for type 2 diabetes mellitus (T2DM). Two of the three approaches, named Selecting Better (SB) and Weighted Average (WA), allowed the data to remain within institutional boundaries by using pre-built prediction models; the third, named Combining Data (CD), aggregated raw patient data into a single dataset. The prediction performance and prediction coverage of the resulting models were compared to single-institution models to help judge the relative value of adding external data and to determine the best method to generate optimal models for clinical decision support. The results showed that models using WA and CD achieved higher prediction performance than single-institution models for common treatment patterns. CD outperformed the other two approaches in prediction coverage, which we defined as the number of treatment patterns predicted with an Area Under Curve of 0.70 or more. We concluded that 1) WA is an effective option for improving prediction performance for common treatment patterns when data cannot be shared across institutional boundaries and 2) CD is the most effective approach when such sharing is possible, especially for increasing the range of treatment patterns that can be predicted to support clinical decision making.

Keywords

Artificial intelligence
Clinical decision support system
Health information interoperability
Disease management
Chronic disease

Abbreviations

AUC-ROC
Area Under the Curve - Receiver Operating Characteristic
CCS
Clinical Classification Software
CD
Combining Data
CDS
Clinical Decision Support
DPP4
Dipeptidyl-Peptidase 4
eGFR
Estimated Glomerular Filtration Rate
EHR
Electronic Health Record
GLP1
Glucagon-like Peptide 1 Receptor Agonist
HbA1c
Hemoglobin A1c
IM
Independent Modeling
INPC
Indiana Network for Patient Care
INS
Long-Acting Insulin
LDL
Low-Density Lipoprotein
LR
Logistic Regression
MET
Metformin
NDC
National Drug Code
PDC
Proportion of Days Covered
SB
Selecting Better
SGLT2
Sodium-Glucose Cotransporter-2G Inhibitor
SUL
Sulfonylurea
TPGE
Treatment Pathway Graph Estimation
TZD
Thiazolidinedione
T2DM
Type 2 Diabetes Mellitus
UUH
University of Utah Health
WA
Weighted Average

Cited by (0)

1

Co-senior author.