Machines Learning 学习笔记(Week1&2)
4156 단어 MachineLearning
Supervised Learning
Linear Regression Model
1.Hypothesis Function: (the function to best fit in the training set)
假设函 수/가정 함수
h_\theta(x) = \theta_0 + \theta_1 x_1
$\theta_i$'s: Parameters
2.Cost Function: (to measure the performance/accuracy of the hypothesis function)
大价函数/목적함수
J(\theta_0,\theta_1) =
\frac{1}{2m}
\sum_{i=1}^m
\Bigl(
h_\theta (x_i) - y_i
\Bigr)
^2
$m$ is called a training set (or the # of training examples)
$(x_i , y_i)$ is called a training example
3.Gradient Descent: (the algorithm used to find the best parameters $\theta$ = minimize the cost function J)
사도 하강법/최급강하법
\theta_j
:=
\theta_j -
\alpha
\frac{d}{d \theta_j}
J(\theta_0, \theta_1)
$\alpha$ is the learning rate (the step size of descent)
$\frac{d}{d\theta_j}J(\theta_0,\theta_1)$ is the partial derivative
h_\theta(x) = \theta_0 + \theta_1 x_1
J(\theta_0,\theta_1) =
\frac{1}{2m}
\sum_{i=1}^m
\Bigl(
h_\theta (x_i) - y_i
\Bigr)
^2
\theta_j
:=
\theta_j -
\alpha
\frac{d}{d \theta_j}
J(\theta_0, \theta_1)
\theta_2, ...,\theta_n$
Multivariate Linear Regression
1. Hyphthesis Function:
\begin{align}
h_\theta(x)
&= \theta_0 x_0 + \theta_1 x_1 + ... +\theta_n x_n \\
&= \theta^T x
\end{align}
This is a vectorization of our hypothesis function.
2. Gradient Descent for Multiple Variables:
repeat until convergence: {
\theta_j :=
\theta_j - \alpha
\frac{1}{m}
\sum_{i=1}^m
\Bigl(
h_\theta (x^{(i)}) - y^{(i)}
\Bigr)
⋅ x^{(i)}_j
for j := 0...n
3. Gradient Descent in Practice:
x_i := \frac {x_i - \mu_i}{s_i}
Where
$\mu_i$ is the average of all the values for feature (i)
$s_i$ is the range of values (max - min), or the standard deviation.
We can speed up gradient descent by having each of our input values in roughly the same range.
Ideally,
$-1\leq x^{(i)}\leq 1 $,
or
$-0.5\leq x^{(i)}\leq 0.5$
Plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.
It has been proven that if learning rate α is sufficiently small, then J(θ) will decrease on every iteration.
To choose $\alpha$, try
..., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ...
(Methods to improve hypothesis function)
Normal Equation
An alternative way of minimizing J (cost function).
\theta = (X^T X) ^{-1} X^Ty
e.g.
pinv(X'*X)*X'*y
When to use Normal Equation vs. Gradient Descent: (n is # of features $x$)
Reference
이 문제에 관하여(Machines Learning 学习笔记(Week1&2)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/5621b9d94652966a343e텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)