PCA(Principal Component Analysis) (feat. sklearn)
67455 단어 MachineLearningpythonMachineLearning
Pre-requistie
Singular value decomposition
- Singular value decomposition(SVD) of is
- : unitary orthogonal matrix, left singular vector of
- : rectangular diagonal matrix whose diagonal elemensts are (eigenvalues)
- : unitary orthogonal matrix, right singular vector of
sklearn
- breast cancer data
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
Eigenvectors
- SVD and Eigendecomposition
- Note that is the eigenvector of , since
- In
sklearn
,
from sklearn.decomposition import PCA
n_comp=3
pca=PCA(n_components=n_comp)
pca.fit(X_scaled)
- Then,
pca.components_
gives , which is eigenvector of
- Using
svd
in numpy
,
import numpy as np
U, S, VT = np.linalg.svd(X_scaled)
- Check two results below gives the same value in sign!
pca.components_ # sklearn
VT[:n_comp] # svd
Principal components
- The columns of are called the of
- The principal components of a collection of points in a real coordinate space are a sequence of unit vectors, where the -th vector is the direction of a line that best fits the data while being orthogonal to the first vectors
- In
sklearn
,
pca_fit_transform = pca.fit_transform(X_scaled)
- Note that
- So,
svd
in numpy
gives also the principal components!
(X_scaled).dot(pca.components_.T)
- Check two results below gives the same value!
pca_fit_transform # sklearn
(X_scaled).dot(pca.components_.T) # svd
Projection of data onto the principal components
- In
sklearn
,
pca_inverse_transform = pca.inverse_transform(pca_transform)
- Note that gives the projection of on
- is the sum of projection matrix on eigenvectors
- Let -th column vector of as , thenwhere
- Using
svd
in numpy
,
pca_transform.dot(pca.components_)
- : unitary orthogonal matrix, left singular vector of
- : rectangular diagonal matrix whose diagonal elemensts are (eigenvalues)
- : unitary orthogonal matrix, right singular vector of
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
- Note that is the eigenvector of , since
sklearn
,from sklearn.decomposition import PCA
n_comp=3
pca=PCA(n_components=n_comp)
pca.fit(X_scaled)
pca.components_
gives , which is eigenvector of svd
in numpy
,import numpy as np
U, S, VT = np.linalg.svd(X_scaled)
pca.components_ # sklearn
VT[:n_comp] # svd
sklearn
,pca_fit_transform = pca.fit_transform(X_scaled)
svd
in numpy
gives also the principal components!(X_scaled).dot(pca.components_.T)
pca_fit_transform # sklearn
(X_scaled).dot(pca.components_.T) # svd
sklearn
,pca_inverse_transform = pca.inverse_transform(pca_transform)
- is the sum of projection matrix on eigenvectors
- Let -th column vector of as , thenwhere
svd
in numpy
,pca_transform.dot(pca.components_)
or
(X_scaled).dot(pca.components_.T).dot(pca.components_)
- Check two results below gives the same value!
pca_inverse_transform # sklearn
pca_transform.dot(pca.components_) # svd
(X_scaled).dot(pca.components_.T).dot(pca.components_) # svd
- Wrap-up
- Given that
from sklearn.decomposition import PCA n_comp=3 pca=PCA(n_components=n_comp) pca.fit(X_scaled)
Item | PCA in sklearn | svd in numpy |
---|---|---|
Eigenvectors | pca.components_ | VT in U, S, VT = np.linalg.svd(X_scaled) |
Principal components | pca.fit_transform(X_scaled) | (X_scaled).dot(pca.components_.T) |
Projection onto the principal components | pca.inverse_transform(pca_transform) | (X_scaled).dot(pca.components_.T).dot(pca.components_) |
PCA projection recovery process
from sklearn.decomposision import PCA
n_comp = 330
pca = PCA(n_components = n_comp)
pca_fit_transform = pca.fit_transform(R.T)
pca_inverse_transform = pca.inverse_transfomr(pca_fit_transform)
- Additional eigenvalueswhere and
mu_hat_for_EV = list(map(lambda x : np.mean(x), COMPONENTS)
Sigma_hat_for_EV = np.cov(COMPONENTS)
S_new = 500
W_prime = np.random.multivariate_normal(mu_hat_for_EV, Sigma_hat_for_EV, S_new)
generated = np.matmul(pca_inverse_transform, W_prime.T)
[Reference]
Author And Source
이 문제에 관하여(PCA(Principal Component Analysis) (feat. sklearn)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://velog.io/@hyangki0119/PCAPrincipal-Component-Analysis-feat-sklearn
저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
from sklearn.decomposision import PCA
n_comp = 330
pca = PCA(n_components = n_comp)
pca_fit_transform = pca.fit_transform(R.T)
pca_inverse_transform = pca.inverse_transfomr(pca_fit_transform)
where and
mu_hat_for_EV = list(map(lambda x : np.mean(x), COMPONENTS)
Sigma_hat_for_EV = np.cov(COMPONENTS)
S_new = 500
W_prime = np.random.multivariate_normal(mu_hat_for_EV, Sigma_hat_for_EV, S_new)
generated = np.matmul(pca_inverse_transform, W_prime.T)
Author And Source
이 문제에 관하여(PCA(Principal Component Analysis) (feat. sklearn)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@hyangki0119/PCAPrincipal-Component-Analysis-feat-sklearn저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)