Say we train a prediction model as such:

There are two different ways to interpret the models.
On a global level, we can check which features were the most predictive. There are numerous methods to predict this, such as the coefficients in the linear regression model and the number of splits in the tree-based models.
On a local level, we can check the HETEROGENEITY of different features. For example, age was the most important feature across all the individuals, but say for Frank, a 50-year-old who works as a video game tester, his occupation is going to be much more significant than his age in determining whether he like computer games. Identifying which features were most important for Frank specifically involves finding feature importances on a ‘local’ – individual – level.
Mathematical denotation of this locality is shown below:
$$ g_{Frank} = \phi_{FrankAge} + \phi_{FrankGender} + \phi_{FrankOccupation} $$
Note that, I don’t multiply the ϕ values by the corresponding x. Instead, I multiply it by 1 if the feature is present, and 0 if it is not.
SHAP is based on the game theory. Consider the following scenario: a group of people are playing a game. As a result of playing this game, they receive a certain reward; how can they divide this reward between themselves in a way which reflects each of their contributions?
There are a few things which everyone can agree on; meeting the following conditions will mean the game is ‘fair’ according to SHAP values:
Translate this into the previous notation. Now we get: