Replicability and reproducibility of predictive models for diagnosis of depression among young adults using Electronic Health Records

Nickson, David; Singmann, Henrik; Meyer, Caroline; Toro, Carla; Walasek, Lukasz

doi:10.1186/s41512-023-00160-2

Table 5 Alternative ML models, notes, and implementation library

From: Replicability and reproducibility of predictive models for diagnosis of depression among young adults using Electronic Health Records

ML Model	Notes	Implementation library for R (v 4.1.3)
LASSO (Least Absolute Shrinkage Selection Operator)	A regression model but, unlike stepwise logistic regression in the NRCBM study, it uses a regularization term to penalize complex models thus supporting the selection of only the more important predictors	library (glmnet) 4.1–3
Random Forest	An ensemble learning method supporting regression and classification. Creates multiple decision trees based on subsets of training data, then uses them to make predictions based on mode/mean of individual trees	library (randomForest) 4.7–1
Gradient Boosting	An ensemble approach that combines predictions form multiple weaker models such as decision trees or regression models, using a gradient descent method to improve accuracy. It is suitable for both classification and regression applications	library (gbm) 2.1.8
XGBoost	This is another ensemble approach but, unlike Gradient Boosting it uses a Newton–Raphson function and special penalization techniques for tree selection. It is considered to offer improved performance vs e.g., Gradient Boosting, but at the expense of interpretability	library (xgboost) 1.5.2.1
Rpart	“Recursive partitioning” is a decision tree algorithm for generating classification, regression and survival trees. The resulting decision trees are considered easy to interpret	library (rpart)4.1.16
PRE (Prediction Rules Ensembles)	Used for both regression and classification, models are based on a combination of very simple, “if x then predict y” rules. The aim of PRE is to aim to optimize both accuracy and interpretability	library (pre)1.0.4
Stepwise Logistic Regression (original model)	Stepwise regression model based on the logit function used with pre-specified predictors for classification	library (rms) 6.6–0

Note: Fuller descriptions of these methods are available in the documentation accompanying the libraries and in other sources such as ML papers and textbooks. Code vignettes will be made available via this Open Science Framework link: https://osf.io/573uw/

Back to article page

ISSN: 2397-7523

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com

Diagnostic and Prognostic Research

Contact us