Feature Importance

1. Coefficients:

In linear models (like linear regression, logistic regression), the magnitude of the coefficients can be an indicator of feature importance.

Say there is a simple regression model as shown below:

$$ Income = \alpha+\beta(EducationLevel)+\theta(Sex) $$

The coefficients beta and theta could indicate the marginal change of the dependent variables when there is a unit change of the independent variables. Also, note that, independence of the variables is important, as excessive correlation between variables may deteriorate the interpretability of the model.

For example, for the linear regression model shown above, say if education level and sex have a strong positive correlation, the rise of education level will lead to the rise of health level as well. Therefore, the coefficients cannot correctly reflect importance of the feature.

2. Permutation Importance:

Two different approaches

Shuffle the training set
Shuffle the validation set

2.1. Shuffle the training set

This involves randomly shuffling a single column of the validation dataset, leaving the target and all other columns in place. If the model's accuracy decreases significantly, it suggests the variable is important.

Untitled

The diagram above illustrates the process as shown below:

Train your model on the dataset as usual.
Evaluate the model and record the score (accuracy, F1, etc.).
For each feature you want to test:
- Shuffle that feature's values in the validation set, thereby breaking the relationship between the feature and the target.
- Evaluate the model on this perturbed dataset.
- Record the score of the model with the shuffled feature.
The importance of a feature is determined by how much the score (e.g., accuracy) decreases when the feature is shuffled.

Pro: Intuitive