Machines Learning 学习笔记(Week3)
7026 단어 MachineLearning
Supervised Learning
Classification Problem
1. Hypothesis Function:
$g(z) =\frac{1}{1+e^{-z}} $
h_\theta(x) = g(\theta^Tx) =
\frac{1}{1 + e^{-\theta^Tx}}
the probability that our output is 1
$ h_\theta (x) = P (y = 1 | x;\theta) = 1 - P (y = 0 | x;\theta) $
Decision Boundary:
$h_\theta(x)\geq 0.5\rightarrow y=1, h_\theta(x) < 0.5\rightarrow y=0$
means $g(z)\geq 0.5\rightarrow z\geq 0\rightarrow y=1 $
2. Cost Function:
J(\theta) = - \frac{1}{m}
\sum_{i=1}^{m}
\Bigl[y^{(i)}log\bigl(h_\theta(x^{(i)})\bigr)
+(1-y^{(i)})log\bigl(1-h_\theta(x^{(i)})\bigr)
\Bigr]
Vectorized implementation:
$h=g(X\theta)$
$J(\theta)=\frac{1}{m}\bigl(-y^Tlog(h)-(1-y)^Tlog(1-h)\bigr)$
3. Gradient Descent:
Repeat {
\theta_j := \theta_j- \frac{\alpha}{m}
\sum_{i=1}^{m}
\bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_j^{(i)}
}
Vectorized implementation:
$\theta :=\theta -\frac{\alpha}{m}X^T\bigl(g(X\theta)-\vec{y}\bigr)$
Advanced Optimization
코드:
First, we need to provide a function that evaluates both
$J(\theta)$, and
$\frac{\alpha}{\alpha\theta_j}J(\theta)$
function [jVal, gradient] = costFunction(theta)
jVal = [...code to compute J(theta)...];
gradient = [...code to compute derivative of J(theta)...];
end
Then we use "fminunc()"optimization algorithm along with the "optimset()"function that creates an object containing the options we want to send to "fminunc()".
options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);
Multiclass Classification: One-vs-all
Multiclass means y = {0,1, ... ,n}. Simply apply the same logistic algorithem to each class:
Train a logistic regression classifier $h_\theta(x)$ for each class to predict the probability that  y = i .
To make a prediction on a new x, pick the class that maximizes $h_\theta(x)$
The Problem of Overfitting
Too many features, too complicated function - high variance:
1. Regularized Cost Function:
regularize all of our theta parameters in a single summation:
min_\theta \rightarrow J(\theta)= \frac{1}{2m}\Biggl[
\sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)^2 +
\lambda\sum_{j=1}^n\theta_j^2 \Biggr]
The $\lambda$, or lambda, is the regularization parameter.
2. Regularized Gradient Descent:
Repeat {
\theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_0^{(i)}
\theta_j := \theta_j - \alpha \Biggl[ \Bigl(\frac{1}{m} \sum_{i=1}^{m} \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)x_j^{(i)}\Bigr)+\frac{\lambda}{m}\theta_j \Biggr]\qquad j\in {1,2...n}
}
$\theta_j$ can also be represented as:
$\theta_j :=\theta_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m}\sum_{i=1}^m\bigl(h_\theta(x) ^{(i)})-y^{(i)})x_j^{(i)}$
Intuitively, you can see it as reducing the value of $\theta_j$ by some amount on every update.
$\theta = (X^TX+\lambda L)^{-1}X^Ty$
where $L$ =
\begin{bmatrix}
0& & & & \\
&1& & & \\
& &1& & \\
& & &\ddots& \\
& & & &1\\
\end{bmatrix}\qquad (n+1) \, x \, (n+1) \, dimension
Repeat {
\theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_0^{(i)}
\theta_j := \theta_j - \alpha \Biggl[ \Bigl(\frac{1}{m} \sum_{i=1}^{m} \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)x_j^{(i)}\Bigr)+\frac{\lambda}{m}\theta_j \Biggr]\qquad j\in {1,2...n}
}
Cost function (regularized):
$J(\theta) = -\frac{1}{m}
\sum_{i=1}^{m}
\Bigl[y^{(i)}log\bigl(h_\theta(x^{(i)})\bigr)
+(1-y^{(i)})log\bigl(1-h_\theta(x^{(i)})\bigr)
\Bigr]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2$
Reference
이 문제에 관하여(Machines Learning 学习笔记(Week3)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/8b6f9feda5c8178e02bc텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)