Skip to main content

Principal Component Analysis

Principal Component Analysis (PCA)

https://www.youtube.com/watch?v=TJdH6rPA-TI

- Minimizes the error and maximizes the spread (variance) - finds that dimension

- Principal components are orthogonal

- PCA is like projecting data to new components


Performed by carrying out eigen-decomposition of the covariance matrix. - Mathematically

PCA is a variance-maximizing technique that projects original data onto a direction that maximizes variance.

- Performs linear mapping of original data to a lower-dimensional space such that the variance of data in the low-dimensional representation is maximized.

A covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

https://www.geeksforgeeks.org/mathematics-covariance-and-correlation/

Covariance: 

joint variability of two random variables - gives  linear relationship between variables, ranges from -infinity to +infinity


Correlation:

between -1 to +1

 




Eigen-decomposition: https://www.youtube.com/watch?v=PFDu9oVAE-g

X = nxm , n=number of samples/observations, m=number of features

1.
Covariance matrix = [XT X]
Covariance matrix size = mxm

2.
Obtain pairs:
Eigen decomposition gives a set of eigen vectors and set of eigen pairs

[XT X] --> Decomposition --> W (eigen vectors), lambda (eigen values)

3.
T = XW
Columns of W are called loadings = no. of columns = m
Columns of T are called scores
The columns are ordered in decreasing orders of the eigen values - first one has highest eigen value

4.
so depending on how many dimensions you want:
T = X Wr  (r = number of principal components)