SKETCHINESS ALERT
Recursive partitioning
Pruning & Regularization
When you split too much → overfitting
When you split too few → underfitting
Therefore, we need to find the moderate level between the two
$$ Cost(T)=Err(T)-\alpha L(T) $$
Intuition & main idea:
Decision tree model only builds a single tree. This has a good interpretability but is prone to overfitting → What if we do subsampling, train multiple trees, and compute the mean (or voting)?
Implementation
For each tree:
Serial boosting: Put more weight on wrongly classified instances.
Intuition:
Recall that linear regression regards the residual as something predictable. However, GBMs treat residuals as something “learnable”
$$ \hat{y}=f_1(x) \newline f_2(x)=y-f_1(x) \newline f_3(x)=y-f_1(x)-f_2(x) \newline ... $$