Module 6: Traditional ML Methods


Topic 4: Ensembles

Ensembles of ML methods

In many cases, a single ML model can be augmented by creating an ensemble of diverse methods.  Two popular approaches for doing this build from the decision trees that we just learned about.  For this topic, we will discuss why ensemble methods can be so powerful and then overview two popular methods:  random forests and gradient boosted forests.

The reading for this topic is Section 19.8 (Ensemble Methods).

Motivations for ensemble methods

When training a single ML model, there is a tradeoff between the bias of your predictions and the variance of your predictions.  This is one of the motivations for ensemble methods.  This quick video gives you some motivation for why we want to train ensemble methods.

Copy of my slides

Random Forests 

Random forests are an ensemble of decision trees.  Watch the video below to learn about how to create a diverse set of trees.  If you just train a tree on the exact same data, you would end up with a forest of identical trees because the decision tree algorithm is deterministic (except for breaking ties).  You need to make changes to the algorithm in a few ways to train a diverse forest.

Copy of my slides

Gradient Boosting

Random forests are incredibly powerful but they are not the only way to do ensemble learning.  Another approach is called boosting.  In the video below, I discuss boosting in general (which can be used on many different ML methods!) and then how to apply it for growing a boosted forest, called gradient boosting.

Copy of my slides

Exercise

Complete the exercise on ensemble tree methods