ML Model | Notes | Implementation library for R (v 4.1.3) |
---|---|---|
LASSO (Least Absolute Shrinkage Selection Operator) | A regression model but, unlike stepwise logistic regression in the NRCBM study, it uses a regularization term to penalize complex models thus supporting the selection of only the more important predictors | library (glmnet) 4.1–3 |
Random Forest | An ensemble learning method supporting regression and classification. Creates multiple decision trees based on subsets of training data, then uses them to make predictions based on mode/mean of individual trees | library (randomForest) 4.7–1 |
Gradient Boosting | An ensemble approach that combines predictions form multiple weaker models such as decision trees or regression models, using a gradient descent method to improve accuracy. It is suitable for both classification and regression applications | library (gbm) 2.1.8 |
XGBoost | This is another ensemble approach but, unlike Gradient Boosting it uses a Newton–Raphson function and special penalization techniques for tree selection. It is considered to offer improved performance vs e.g., Gradient Boosting, but at the expense of interpretability | library (xgboost) 1.5.2.1 |
Rpart | “Recursive partitioning” is a decision tree algorithm for generating classification, regression and survival trees. The resulting decision trees are considered easy to interpret | library (rpart)4.1.16 |
PRE (Prediction Rules Ensembles) | Used for both regression and classification, models are based on a combination of very simple, “if x then predict y” rules. The aim of PRE is to aim to optimize both accuracy and interpretability | library (pre)1.0.4 |
Stepwise Logistic Regression (original model) | Stepwise regression model based on the logit function used with pre-specified predictors for classification | library (rms) 6.6–0 |