1. The Random Forest (RF) machine learning approach
- Breiman, Leo (2001). “Random Forests”. Machine Learning 45 (1): 5-32
- Leo Breiman’s & Adele Cutler’s Website
- RF is an ensemble classifier which creates many decision trees (typically 1000)
Figure: mathworks.com
- RF is favourable when the number of variables (predictors) is higher than the number of observations (samples)
- RF builts a predictive machine learning model by aggregating multiple deep (unpruned) trees
- for each tree, only a subset of the data is used:
- variance is reduced by sampling with replacement from the observations (samples)
- additional predictors can be identified by selecting a random subset of the variables (predictors)
- prevents from overfitting the model to a (possibly insufficient) training set