Machines Learning 学习笔记(Week9)
7226 단어 MachineLearning
Anomaly Detection
Algorithm (based on Gaussian Distribution)
1, Choose features $x_i$ that you think might be indicative of anomalous examples
2, Fit parameters $\mu_1,...,\mu_n,\sigma_1^2,...,\sigma_n^2$
\mu_j = \frac{1}{m}\sum_{i=1}^mx_j^{(i)}\\
\sigma_j^2 = \frac{1}{m}\sum_{i=1}^m(x_j^{(i)}-\mu_j)^2
3, Given new examples x, compute $p(x)$:
p(x) = \prod_{j=1}^n p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^n \frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j-\mu_j)^2}{2\sigma_j^2})
Anomaly if $p(x)<\varepsilon$
개발 및 평가
Assume we have some labeled data, of anomalous and non-anomalous examples (y=0 if normal, y=1 if anomalous).
Training set(60%): $x^{(1)},x^{(2)},...,x^{(m)}$(assume normal examples/not anomalous)
Cross validation set(20%):$(x_{cv}^{(1)},y_{cv}^{(1)}),...,(x_{cv}^{(m_{cv}) },y_{cv}^{(m_{cv})})$
Test set(20%):$(x_{test}^{(1)},y_{test}^{(1)}),...,(x_{test}^{(m_{test})} ,y_{test}^{(m_{test})})$
Aircraft engines example
10000 good (normal) engines, 20 flawed engines (anomalous)
Training set: 6000 good engines
CV set: 2000 good engines, 10 anomalous
Test set: 2000 good engines, 10 anomalous
평가
Fit model p(x) on the training set
On a cross validation set, predict:
- y=1 if p(x)<ε (anomaly)
- y=0 if p(x)>=ε (normal)
Possible evaluation metrics:
- Precision/Recall
- F1 Score
Can also use cross validation set to choose parameter ε
Anomaly Detection vs. Supervised Learning
Anomaly Detection
Supervised Learning
Very small number of positive examples(y=1). (0-20 is common). Large number of negative examples(y=0).
Large number of positive and negative examples.
Many different "types"of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look like nothing like any of the anomalous examples we've seen so far.
Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set.
Non-gaussian features
Transform the data: $log(x),\,log(x+c),\,x^{\frac{1}{2}},\,x^{\frac{1}{3}}.. .$

Multivariate Gaussian Distribution (Optional)
Purpose: capture correlations between different features in case of unusual combinations of features
Anomaly detection with the multivariate Gaussian
1, Fit model p(x) by setting
\mu = \frac{1}{m}\sum_{i=1}^mx^{(i)}\\
\Sigma = \frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)(x^{(i)}-\mu)^T\\
(\mu \in R^n,\,\Sigma\in R^{n*n})
2, Given a new example x, compute
p(x) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}exp\Bigl(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\Bigr)\\
(|\Sigma|=determinant\,of\,\Sigma)
Flag an anomaly if $p(x) <\varepsilon$
Relationship with original model
\mu_j = \frac{1}{m}\sum_{i=1}^mx_j^{(i)}\\
\sigma_j^2 = \frac{1}{m}\sum_{i=1}^m(x_j^{(i)}-\mu_j)^2
p(x) = \prod_{j=1}^n p(x_j;\mu_j,\sigma_j^2) = \prod_{j=1}^n \frac{1}{\sqrt{2\pi}\sigma_j}exp(-\frac{(x_j-\mu_j)^2}{2\sigma_j^2})
\mu = \frac{1}{m}\sum_{i=1}^mx^{(i)}\\
\Sigma = \frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu)(x^{(i)}-\mu)^T\\
(\mu \in R^n,\,\Sigma\in R^{n*n})
p(x) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}exp\Bigl(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\Bigr)\\
(|\Sigma|=determinant\,of\,\Sigma)



Original Model vs. Multivariate Gaussian

Recommender Systems

$r(i,j)=1$ if user j has rated movie i (0 otherwise)
$y^{(i,j)}=$ rating by user j on movie i (if defined)
$\theta^{(j)}=$ parameter vector for user j
$x^{(i)}=$ feature vector for movie i
For user j, movie i, predicted rating: $(\theta^{(j)})^T(x^{(i)})$
$n_u=$ no. users
$n_m=$ no. movies
$m^{(j)}=$ no. of movies rated by user j
Content-based Recommendations
We know what the content is and how the content is. We are trying to predict users' preferences.
Optimization algorithm:
\min_{\theta^{(1)},...,\theta^{(n_u)}} \frac{1}{2}\sum_{j=1}^{n_u} \sum_{i:r(i,j)=1} \Bigl((\theta^{(j)})^Tx^{(i)}-y^{(i,j)}\Bigr)^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta_k^{(j)})^2\\
(\theta^{(j)} \in R^{n+1})
Gradient descent update:

Collaborative Filtering
We don't know the content or the users' preferences. But we are trying to:

$x\in R^n$
$\theta\in R^n$
Don't need $x_0$ and $\theta_0$
Collaborative filtering algorithm

Vectorization: Low rank matrix factorization

Find related products (movies)
For each product i, we learn a feature vector $x^{(i)}\in R^n$.
Find the products (movies) j with the smallest $||x^{(i)}-x^{(j)}||$.
Mean Normalization
For users who have not rated any movies, cannot predict initial values (all 0s):

Reference
이 문제에 관하여(Machines Learning 学习笔记(Week9)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://qiita.com/CHrIs23436939/items/c82c2f3a48976aaca50c
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
\min_{\theta^{(1)},...,\theta^{(n_u)}} \frac{1}{2}\sum_{j=1}^{n_u} \sum_{i:r(i,j)=1} \Bigl((\theta^{(j)})^Tx^{(i)}-y^{(i,j)}\Bigr)^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(\theta_k^{(j)})^2\\
(\theta^{(j)} \in R^{n+1})
Reference
이 문제에 관하여(Machines Learning 学习笔记(Week9)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/c82c2f3a48976aaca50c텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)