Tree-based models (ensemble models and GBMs)

SKETCHINESS ALERT

1. Decision Tree

Recursive partitioning
- Find the split that minimizes the impurity (information gain for classification, SSE for regression)
- Repeat it till sum of impurities of all the leaf nodes is zero.
Pruning & Regularization

When you split too much → overfitting

When you split too few → underfitting

Therefore, we need to find the moderate level between the two

$$ Cost(T)=Err(T)-\alpha L(T) $$
- Err(T): The error (cross entropy, MSE, etc).
- L(T): The number of leaf nodes.
- alpha: The hyperparameter → weight on L(T).

Intuition & main idea:

Decision tree model only builds a single tree. This has a good interpretability but is prone to overfitting → What if we do subsampling, train multiple trees, and compute the mean (or voting)?
Implementation

For each tree:
- Pick some rows (usually square root of n)
- Pick some columns (usually square root of n) → This makes RF different from bagging.

Serial boosting: Put more weight on wrongly classified instances.

Intuition:

Recall that linear regression regards the residual as something predictable. However, GBMs treat residuals as something “learnable”

$$ \hat{y}=f_1(x) \newline f_2(x)=y-f_1(x) \newline f_3(x)=y-f_1(x)-f_2(x) \newline ... $$