Module 6: Traditional ML Methods
Topic 3: Decision Trees
Decision trees and Random forests
Decision tree based classifiers are very popular in machine learning and data science applications. The most popular methods are random forests and gradient boosted classifiers, both of which are ensembles of decision trees. For this module, we will discuss basic decision tree algorithms and then discuss random forests and gradient boosting.
One reason that decision trees are a popular method is that they are inherently human readable, at least as a single tree, the forest is harder to read. A single decision tree is really a flow chart, a type of diagram that humans have been making and reading for many years! The difference is that the flow chart splits are all created automatically using machine learning rather than by hand.
We will cover random forests in the next topic of this module.
For our reading, we will jump back to the remaining section in chapter 19 that we skipped in the previous module.
- Read Section 19.3
Decision trees
Graphic from this article at DataScience foundation (they often have good articles/graphics!)
Decision Trees: What kinds of trees are there?
We will first focus on the basic decision tree algorithm. This will include classification trees and regression trees.
What types of trees exist? Why do we want to study trees?
Copy of my slides
How do you grow a decision tree?
The trees can be interpreted as a flow chart but they are not designed by hand, like a flow chart is. The next two videos discuss how to grow a decision tree.
Copy of my slides
- Wikipedia talks about Gini Impurity score
- An example of how to use chi-squared as a decision tree score
Example of choosing the best attribute
Copy of my slides
Exercise
Complete the exercise on decision trees