link

1. Intuition:

I want to use the same logic of linear regression on binary classification problems. However, the range of y is 0 to 1. Then, I would like to transform the y value from [0,1] to [-inf, +inf].

To achieve this, logistic regression converts the probability value to log odds

$$ f(p)=log(\frac{p}{1-p}) $$

Side notes: There are two crucial properties of log odds:

  1. When p = 0.5, the log odds is 0. 0 value works as a dividing threshold
  2. log odds allows exponential relationship to be linear

2. Interpretation of Coefficients

To briefly recall linear regression, the coefficient values (beta) are gradients which indicate the change of dependent variable y per unit change of dependent variable x.

In logistic regression, the coefficient values are the same as the ones of linear regression, except that the y axis is log odds values (The graph on the right). In this case, the coefficient values indicate the change of log(odds) per unit change of x.

스크린샷 2024-02-18 오전 1.11.58.png

스크린샷 2024-02-18 오전 1.11.19.png

3. Finding the best-fit line

In logistic regression, we try to find the best-fit line such that it maximizes the log likelihood. The formula is shown below:

$$ LL=\sum_{i}y_i*ln(\hat{y}_i)+(1-y_i)*ln(1-\hat{y}_i) $$

Say we do linear regression. As the y values of the training sets are either -inf or +inf, we can’t compute the MSE. Therefore, what we do is to:

Loop forever:

  1. Draw a line and project all the points.
  2. Convert the log(odds) to probability values
  3. Compute the likelihood
  4. Conduct the gradient descent and update coefficient