Machine Learning Principles Can Improve Hip Fracture Prediction

Machine Learning Principles Can Improve Hip Fracture Prediction

Abstract:  Apply machine learning principles to predict

hip fractures and estimate predictor importance in

Dual-energy X-ray absorptiometry (DXA)-scanned men

and women. Dual-energy X-ray absorptiometry data from

two Danish regions between 1996 and 2006 were combined

with national Danish patient data to comprise 4722

women and 717 men with 5 years of follow-up time (original

cohort n=6606 men and women). Twenty-four statistical

models were built on 75% of data points through k-5,

5-repeat cross-validation, and then validated on the remaining

25% of data points to calculate area under the curve

(AUC) and calibrate probability estimates. The best models

were retrained with restricted predictor subsets to estimate

the best subsets. For women, bootstrap aggregated flexible

discriminant analysis (“bagFDA”) performed best with

a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities

following Naïve Bayes adjustments. A “bagFDA”

model limited to 11 predictors (among them bone mineral

densities (BMD), biochemical glucose measurements,

general practitioner and dentist use) achieved a test AUC

of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting

(“_xgbTree”) performed best with a test AUC of 0.89 [0.82;_

0.95], but with poor calibration in higher probabilities. A

ten predictor subset (BMD, biochemical cholesterol and

liver function tests, penicillin use and osteoarthritis diagnoses)

achieved a test AUC of 0.86 [0.78; 0.94] using an

“_xgbTree” model. Machine learning can improve hip fracture_

prediction beyond logistic regression using ensemble

models. Compiling data from international cohorts of

longer follow-up and performing similar machine learning

procedures has the potential to further improve discrimination

and calibration.

Conclusion: We conclude that hip fracture risk can be modelled with

high discriminative performance for men (Test AUC of

0.89 [0.82; 0.95], sensitivity 100%, specificity 69% at the

Youden probability cut-off) and particularly for women

(Test AUC 0.91 [0.88; 0.94], sensitivity 88%, specificity

81% at the Youden probability cut-off) using advanced predictive

models. Ensemble models using bootstrap aggregation

and boosting performed best in both cohorts, and

probabilities can generally be calibrated well with a Naïve

Bayes approach, although poor for high probability estimates

in men. Models of 11 predictors for women and 9 for

men with combinations of DXA BMD measurements and

primary sector use achieved the highest numerical AUC

values. Further improvements in predictive capability are

likely possible with compilations of more data points and

longer observation periods. We strongly suggest the use of

machine learning principles to model hip fracture risk, and

we welcome an effort to compile existing datasets and perform

advanced predictive modelling.