Skip to main content

Table 5 Alternative ML models, notes, and implementation library

From: Replicability and reproducibility of predictive models for diagnosis of depression among young adults using Electronic Health Records

ML Model

Notes

Implementation library

for R (v 4.1.3)

LASSO (Least Absolute Shrinkage Selection Operator)

A regression model but, unlike stepwise logistic regression in the NRCBM study, it uses a regularization term to penalize complex models thus supporting the selection of only the more important predictors

library (glmnet) 4.1–3

Random Forest

An ensemble learning method supporting regression and classification. Creates multiple decision trees based on subsets of training data, then uses them to make predictions based on mode/mean of individual trees

library (randomForest) 4.7–1

Gradient Boosting

An ensemble approach that combines predictions form multiple weaker models such as decision trees or regression models, using a gradient descent method to improve accuracy. It is suitable for both classification and regression applications

library (gbm) 2.1.8

XGBoost

This is another ensemble approach but, unlike Gradient Boosting it uses a Newton–Raphson function and special penalization techniques for tree selection. It is considered to offer improved performance vs e.g., Gradient Boosting, but at the expense of interpretability

library (xgboost) 1.5.2.1

Rpart

“Recursive partitioning” is a decision tree algorithm for generating classification,

regression and survival trees. The resulting decision trees are considered easy to interpret

library (rpart)4.1.16

PRE (Prediction Rules Ensembles)

Used for both regression and classification, models are based on a combination of very simple, “if x then predict y” rules. The aim of PRE is to aim to optimize both accuracy and interpretability

library (pre)1.0.4

Stepwise Logistic Regression (original model)

Stepwise regression model based on the logit function used with pre-specified predictors for classification

library (rms) 6.6–0

  1. Note: Fuller descriptions of these methods are available in the documentation accompanying the libraries and in other sources such as ML papers and textbooks. Code vignettes  will be made available via this Open Science Framework link: https://osf.io/573uw/