Machines Learning 学习笔记(Week4&5)

Neural Networks

1. Concept

$a_i^{(j)}$ = "activation"of unit $i$ in layer $j$
$Θ^{(j)}$ = matrix of weights controlling function mapping from layer $j$ to layer $j+1$

2. Model Representation
set $x=a^{(1)}$,

$z^{(j)} = Θ^{(j-1)}a^{(j-1)}$
$\searrow$
$a^{(j)} = g(z^{(j)})$
$\swarrow$
$z^{(j+1)}= Θ^{(j)}a^{(j)}$
$\searrow$
$h_Θ(x) = a^{(j+1)} = g(z^{(j+1)})$

3. Application Example

4. Feedforward Propagation Computation Example

5. Cost Function

J(Θ) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K
\Biggl[y_k^{(i)}log\Bigl( \bigl(h_Θ(x^{(i)})\bigr)_k\Bigr)
+ (1-y_k^{(i)})log\Bigl(1- \bigl(h_Θ(x^{(i)})\bigr)_k\Bigr)\Biggr]
+ \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{S_l} \sum_{j=1}^{S_l+1}
(Θ_{j,i}^{(l)})^2

$L$ = total number of layers in the network

$S_l$ = number of units (not counting the bias unit) in layer l

$K$ = number of output units/classes

the double sum simply adds up the logistic regression costs calculated for each cell in the output layer

the triple sum simply adds up the squares of all the individual Θs in the entire network

the i in the triple sum does NOT refer to training example i

5. Back propagation
For training example t=1 to m:

Set $a^{(1)} := x^{(t)}$

Perform forward propagation to compute $a^{(l)}$for l=2,3,...,L

Using $y^{(t)},compute\,\delta^{(L)} = a^{(L)} - y^{(t)} $

Compute $\delta^{(L-1)},\delta^{(L-2)},...,\delta^{2}\, using\,\delta^{(t)} =\bigl( (Θ^{(l)})^T\delta^{(l+1)}\bigr).*a^{(l)}. *(1-a^{(l)})$
*note: $(a^{(l)}. *(1-a^{(l)}) = g^{'}(z^{(l)})\,\leftarrow (g-prime)$

$ Δ^{(l)} := Δ^{(l)} + a_j^{(l)}\delta_i^{(l+1)}$
or with verctorization: $ Δ^{(l)} := Δ^{(l)} +\delta^{(l+1)} (a^{(l)})^T$
Hence we update our new Δ matrix.

$D_{i,j}^{(l)} :=\frac{1}{m} (Δ_{i,j}^{(l)} +\lambda Θ^{(l)}),\, if\, j\neq 0. $

$D_{i,j}^{(l)} :=\frac{1}{m} Δ_{i,j}^{(l)},\, if\, j = 0. $

6. Putting It Together: Training a Neural Network
1. Randomly initialize the weights
2. Implement forward propagation to get $h_Θ(x^{(i)})\,for\,any\,x^{(i)}$
3. Implement the cost function
4. Implement backpropagation to compute partial derivatives
5. Use gradient checking to confirm that your backpropagation works. then disable gradient checking
6. Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta

When we perform forward and back propagation, we loop on every training example:

for i = 1:m,
   Perform forward propagation and backpropagation using example (x(i),y(i))
   (Get activations a(l) and delta terms d(l) for l = 2,...,L

Reference

이 문제에 관하여(Machines Learning 学习笔记(Week4&5)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/9d437e41d1037e97f199

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다