# Machine Learning Principles Can Improve Hip Fracture Prediction

2018 Mar 09**Machine Learning Principles Can Improve Hip Fracture Prediction**

**Abstract: ** Apply machine learning principles to predict

*hip fractures and estimate predictor importance in*

*Dual-energy X-ray absorptiometry (DXA)-scanned men*

*and women. Dual-energy X-ray absorptiometry data from*

*two Danish regions between 1996 and 2006 were combined*

*with national Danish patient data to comprise 4722*

*women and 717 men with 5 years of follow-up time (original*

*cohort n=6606 men and women). Twenty-four statistical*

*models were built on 75% of data points through k-5,*

*5-repeat cross-validation, and then validated on the remaining*

*25% of data points to calculate area under the curve*

*(AUC) and calibrate probability estimates. The best models*

*were retrained with restricted predictor subsets to estimate*

*the best subsets. For women, bootstrap aggregated flexible*

*discriminant analysis (“bagFDA”) performed best with*

*a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities*

*following Naïve Bayes adjustments. A “bagFDA”*

*model limited to 11 predictors (among them bone mineral*

*densities (BMD), biochemical glucose measurements,*

*general practitioner and dentist use) achieved a test AUC*

*of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting*

*(“_xgbTree*”) performed best with a test AUC of 0.89 [0.82;_

*0.95], but with poor calibration in higher probabilities. A*

*ten predictor subset (BMD, biochemical cholesterol and*

*liver function tests, penicillin use* and *osteoarthritis diagnoses)*

*achieved a test AUC of 0.86 [0.78; 0.94] using an*

*“_xgbTree*” model. Machine learning can improve hip fracture_

*prediction beyond logistic regression using ensemble*

*models. Compiling data from international cohorts of*

*longer follow-up and performing similar machine learning*

*procedures has the potential to further improve discrimination*

*and calibration.*

* Conclusion: We conclude that hip fracture risk can be* modelled

**with****high discriminative performance for men (Test AUC of**

**0.89 [0.82; 0.95], sensitivity 100%, specificity 69% at the**

**Youden probability cut-off) and particularly for women**

**(Test AUC 0.91 [0.88; 0.94], sensitivity 88%, specificity**

**81% at the Youden probability cut-off) using advanced predictive**

**models.** Ensemble models using bootstrap aggregation

*and boosting performed best in both cohorts, and*

*probabilities can generally be calibrated well with a Naïve*

*Bayes approach, although poor for high probability estimates*

*in men. Models of 11 predictors for women and 9 for*

*men with combinations of DXA BMD measurements and*

*primary sector use achieved the highest numerical AUC*

*values. Further improvements in predictive capability are*

*likely possible with compilations of more data points and*

*longer observation periods. We strongly suggest the use of*

*machine learning principles to model hip fracture risk, and*

*we welcome an effort to compile existing datasets and perform*

*advanced predictive modelling.*