Machines Learning 学习笔记(Week3)

Supervised Learning

Classification Problem

1. Hypothesis Function:

"Sigmoid Function"or "Logistic Function":
$g(z) =\frac{1}{1+e^{-z}} $

h_\theta(x) = g(\theta^Tx) = 
\frac{1}{1 + e^{-\theta^Tx}}

Interpretation:
the probability that our output is 1
$ h_\theta (x) = P (y = 1 | x;\theta) = 1 - P (y = 0 | x;\theta) $

Decision Boundary:

when
$h_\theta(x)\geq 0.5\rightarrow y=1, h_\theta(x) < 0.5\rightarrow y=0$
means $g(z)\geq 0.5\rightarrow z\geq 0\rightarrow y=1 $

z is the input (e.g. $z =\theta^Tx$)

decision boundary could be any shape:

2. Cost Function:

J(\theta) = - \frac{1}{m}
\sum_{i=1}^{m}
\Bigl[y^{(i)}log\bigl(h_\theta(x^{(i)})\bigr)
+(1-y^{(i)})log\bigl(1-h_\theta(x^{(i)})\bigr)
\Bigr]

Vectorized implementation:
$h=g(X\theta)$
$J(\theta)=\frac{1}{m}\bigl(-y^Tlog(h)-(1-y)^Tlog(1-h)\bigr)$

3. Gradient Descent:
Repeat {

\theta_j := \theta_j- \frac{\alpha}{m}
\sum_{i=1}^{m}
\bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_j^{(i)}

}
Vectorized implementation:
$\theta :=\theta -\frac{\alpha}{m}X^T\bigl(g(X\theta)-\vec{y}\bigr)$

Advanced Optimization

Optimization algorithms:

Gradient descent

Conjugate gradient

BFGS

L-BFGS

코드:
First, we need to provide a function that evaluates both
$J(\theta)$, and
$\frac{\alpha}{\alpha\theta_j}J(\theta)$

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Then we use "fminunc()"optimization algorithm along with the "optimset()"function that creates an object containing the options we want to send to "fminunc()".

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

Multiclass Classification: One-vs-all

Multiclass means y = {0,1, ... ,n}. Simply apply the same logistic algorithem to each class:

Train a logistic regression classifier $h_\theta(x)$ for each class to predict the probability that y = i .

To make a prediction on a new x, pick the class that maximizes $h_\theta(x)$

The Problem of Overfitting

Too many features, too complicated function - high variance:

1. Regularized Cost Function:
regularize all of our theta parameters in a single summation:

min_\theta \rightarrow J(\theta)= \frac{1}{2m}\Biggl[ 
\sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)^2 +
\lambda\sum_{j=1}^n\theta_j^2  \Biggr]

The $\lambda$, or lambda, is the regularization parameter.

2. Regularized Gradient Descent:

Regularized Linear Regression:

Repeat {

\theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_0^{(i)}

\theta_j := \theta_j - \alpha \Biggl[ \Bigl(\frac{1}{m} \sum_{i=1}^{m} \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)x_j^{(i)}\Bigr)+\frac{\lambda}{m}\theta_j \Biggr]\qquad j\in {1,2...n}

}

$\theta_j$ can also be represented as:
$\theta_j :=\theta_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m}\sum_{i=1}^m\bigl(h_\theta(x) ^{(i)})-y^{(i)})x_j^{(i)}$

Intuitively, you can see it as reducing the value of $\theta_j$ by some amount on every update.

Normal Equation:

$\theta = (X^TX+\lambda L)^{-1}X^Ty$
where $L$ =

\begin{bmatrix}
0& & & &  \\
 &1& & &  \\
 & &1& &  \\
 & & &\ddots& \\
 & & & &1\\
\end{bmatrix}\qquad (n+1) \, x \, (n+1) \, dimension

Regularized Logistic Regression:

Repeat {

\theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i=1}^m \bigl(h_\theta(x^{(i)})-y^{(i)}\bigr)x_0^{(i)}

\theta_j := \theta_j - \alpha \Biggl[ \Bigl(\frac{1}{m} \sum_{i=1}^{m} \bigl(h_\theta(x^{(i)})-y^{(i)} \bigr)x_j^{(i)}\Bigr)+\frac{\lambda}{m}\theta_j \Biggr]\qquad j\in {1,2...n}

}

Cost function (regularized):
$J(\theta) = -\frac{1}{m}
\sum_{i=1}^{m}
\Bigl[y^{(i)}log\bigl(h_\theta(x^{(i)})\bigr)
+(1-y^{(i)})log\bigl(1-h_\theta(x^{(i)})\bigr)
\Bigr]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2$

Reference

이 문제에 관하여(Machines Learning 学习笔记(Week3)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/8b6f9feda5c8178e02bc

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다