PCA(Principal Component Analysis) (feat. sklearn)

Pre-requistie

Singular value decomposition

Singular value decomposition(SVD) of
$X\in\mathbb{R}^{n\times p}$
X∈Rn×p is $X = U D V^{\top}$

$U\in\mathbb{R}^{n\times n}$

$D\in\mathbb{R}^{n\times p}$

$V\in\mathbb{R}^{p\times p}$

sklearn

breast cancer data

import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

Eigenvectors

SVD and Eigendecomposition
- Note that $V^{\top}$
In sklearn,

from sklearn.decomposition import PCA
n_comp=3
pca=PCA(n_components=n_comp)
pca.fit(X_scaled)

Then, pca.components_ gives $V^{\top}$
Using svd in numpy,

import numpy as np
U, S, VT = np.linalg.svd(X_scaled)

Check two results below gives the same value in sign!

pca.components_ # sklearn
VT[:n_comp] # svd

Principal components

The columns of $UD$
The principal components of a collection of points in a real coordinate space are a sequence of $p$
In sklearn,

pca_fit_transform = pca.fit_transform(X_scaled)

Note that $\begin{aligned} XV &= UDV^{\top}V \\ &= UD \end{aligned}$
So, svd in numpy gives also the principal components!

(X_scaled).dot(pca.components_.T)

Check two results below gives the same value!

pca_fit_transform # sklearn
(X_scaled).dot(pca.components_.T) # svd

Projection of data onto the principal components

In sklearn,

pca_inverse_transform = pca.inverse_transform(pca_transform)

Note that
$XVV^{\top}$ XVV⊤ gives the projection of
$X$ on
$VV^{\top}$
Using svd in numpy,

pca_transform.dot(pca.components_)

(X_scaled).dot(pca.components_.T).dot(pca.components_)

Check two results below gives the same value!

pca_inverse_transform # sklearn
pca_transform.dot(pca.components_) # svd
(X_scaled).dot(pca.components_.T).dot(pca.components_) # svd

Wrap-up

Given that

from sklearn.decomposition import PCA
n_comp=3
pca=PCA(n_components=n_comp)
pca.fit(X_scaled)

Item	`PCA` in `sklearn`	`svd` in `numpy`
Eigenvectors	`pca.components_`	`VT` in `U, S, VT = np.linalg.svd(X_scaled)`
Principal components	`pca.fit_transform(X_scaled)`	`(X_scaled).dot(pca.components_.T)`
Projection onto the principal components	`pca.inverse_transform(pca_transform)`	`(X_scaled).dot(pca.components_.T).dot(pca.components_)`

PCA projection recovery process

from sklearn.decomposision import PCA

n_comp = 330
pca = PCA(n_components = n_comp)
pca_fit_transform = pca.fit_transform(R.T)
pca_inverse_transform = pca.inverse_transfomr(pca_fit_transform)

Additional eigenvalues $\tilde{e}\sim N (\mu_{\mathsf{W}}, \Sigma_{\mathsf{W}})$ where $\hat{\mu_{\mathsf{W}}}=\frac{1}{n}\sum_{s=1}^{S}e_s$

mu_hat_for_EV = list(map(lambda x : np.mean(x), COMPONENTS)
Sigma_hat_for_EV = np.cov(COMPONENTS)

S_new = 500
W_prime = np.random.multivariate_normal(mu_hat_for_EV, Sigma_hat_for_EV, S_new)

generated = np.matmul(pca_inverse_transform, W_prime.T)

[Reference]

Author And Source

이 문제에 관하여(PCA(Principal Component Analysis) (feat. sklearn)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@hyangki0119/PCAPrincipal-Component-Analysis-feat-sklearn

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다