Logistic Regression

1. Intuition:

I want to use the same logic of linear regression on binary classification problems. However, the range of y is 0 to 1. Then, I would like to transform the y value from [0,1] to [-inf, +inf].

To achieve this, logistic regression converts the probability value to log odds

$$ f(p)=log(\frac{p}{1-p}) $$

When we apply p/(1-p), it transforms p value such that the range changes from [0,1] to [0, +inf].
When we apply log, it transforms the range from [0, +inf] to [-inf, +inf].

Side notes: There are two crucial properties of log odds:

When p = 0.5, the log odds is 0. 0 value works as a dividing threshold
log odds allows exponential relationship to be linear
- e.g. Say there are 2 red balls and 3 blue balls. When we directly compute the odds, 2/3 and 3/2 are not directly comparable (0.667 vs 1.5). However, when we compute the log of odds, we get log(2/3) and log(3/2) respectively. The sum of them is zero. This allows us to conduct direct comparison.

2. Interpretation of Coefficients

To briefly recall linear regression, the coefficient values (beta) are gradients which indicate the change of dependent variable y per unit change of dependent variable x.

In logistic regression, the coefficient values are the same as the ones of linear regression, except that the y axis is log odds values (The graph on the right). In this case, the coefficient values indicate the change of log(odds) per unit change of x.

스크린샷 2024-02-18 오전 1.11.58.png

스크린샷 2024-02-18 오전 1.11.19.png

3. Finding the best-fit line

In logistic regression, we try to find the best-fit line such that it maximizes the log likelihood. The formula is shown below:

$$ LL=\sum_{i}y_i*ln(\hat{y}_i)+(1-y_i)*ln(1-\hat{y}_i) $$

Say we do linear regression. As the y values of the training sets are either -inf or +inf, we can’t compute the MSE. Therefore, what we do is to:

Loop forever:

Draw a line and project all the points.
Convert the log(odds) to probability values
Compute the likelihood
Conduct the gradient descent and update coefficient