https://ddongwon.tistory.com/114

1. What is PCA for?

PCA (Principal Component Analysis) is a technique for dimensionality reduction. With PCA, we can map the data into a space with lower dimensionality.

The problem is that, dimensionality reduction inevitably accompanies information loss. Then, the question is how to conduct the dimensionality reduction such that we could minimize the loss. The PCA tries to tackle this problem

2. Intuitive Understanding of PCA

Say there is a 2-dimensional tabular data as shown below:

스크린샷 2023-11-09 오후 5.21.09.png

  1. we compute the mean point of the data and set it to the origin.

스크린샷 2023-11-09 오후 5.22.22.png

  1. we find the vector that maximizes the sum of square

    As shown below, for a single vector, we find the distance between the origin and the point made by projecting a data point to the vector. We sum up the squared values of all the data points. The vector that maximizes the sum of square can do this.

스크린샷 2023-11-09 오후 5.23.20.png

  1. set the optimal line we found as PC1, and record the loading score.

    In this example, the ratio between x and y axis is (0.97, 0.242). That is the loading score in this case.

스크린샷 2023-11-09 오후 5.44.08.png

  1. draw PC2, which is the vector orthogonal to PC1. With this, rotate the diagram and draw the scree plot.

스크린샷 2023-11-09 오후 5.54.18.png

스크린샷 2023-11-09 오후 5.54.54.png

  1. compute the ratio of SS for 2 different axis. That represents how much information that vector maintains. In this example, PC1 retains 89% of the information and PC2 maintains 11% of the information.

스크린샷 2023-11-09 오후 6.10.17.png

  1. Project the data to lower dimension space.

    스크린샷 2023-11-09 오후 6.13.38.png

    In this example, we reduce the dimensionality from 2 to 1.

    However, if we would like to reduce the 3-dimensional data, say if we have PC1 (70%), PC2(20%), and PC3(10%), we can have 2 different choices:

    1. 3D to 2D: PC1 and PC2 (70% + 20%=90%)
    2. 3D to 1D: PC1 only (70%)