[Review] Math of CNN

7089 단어 math CNN DeepLearning

Preface

In this article, I would like to elaborate the arithmetic in CNN.
But I may make mistake so please feel free to leave the comment below.

Implementation of CNN in python with Numpy

Visual Image of Networks

source : h tp // w w. md 피. 이 m/1099-4300/19/6/242

Math in CNN

In this section, I will write about the mathematical concept of Convolutional networks and its three features. As you probably know, the network composes many neurons in layers. And each layer is connected by its pipeline. So first the input data is propaga layers till the output layer. Then in order to strengthen the ability to represent the dataset, the net needs to learn by tuning its parameters measuring the error between the target and its prediction.
Hence in this section we will see two faces of the layers, forward-propagation and back-propagation.

Three Key Features in CNN

Convolutional layer

Activation layer

Pooling layer(generally we apply max-pooling)

So let me describe one by one.

1. Convolutional layer.

Let's consider a single image case. For simplicity, I would like to define the image size and its convoluted image size as above.
So notation here is like below.
$x$ : input
$a^k$ : after convoluted image
$k$ : index of kernel(weight filter)
$W$ : kernel(weight filter)
$b$ : bias
$E$ : Cost Function

forward prop

a^{(k)}_{ij} = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} W^{(k)}_{st} x_{(i+s)(j+t)} + b^{(k)}

back prop to update the weight

\frac{\partial E}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} x_{(i+s)(j+t)}\\

\frac{\partial E}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}}

Bear in mind that the propagated error can be noted like below.

\delta^{k}_{ij} = \frac{\partial E}{\partial a^{(k)}_{ij}}

back prop to previous layer

\frac{\partial E}{\partial x_{ij}} = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} \frac{\partial a^{(k)}_{(i-s)(j-t)}}{\partial x_{ij}}
= \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} W^{(k)}_{st}

2. Activation layer

When it comes to the selection of activation functions, indeed we have some options, for example sigmoid or hyperbolic tangent. So in this section, first let me brief the appearance of each function, then move on to see propagations.

Activation Families

sigmoid : $\sigma(x) =\frac{1}{1 + e^{-x}}$

tanh : $tanh(x) =\frac{e^x - e^{-x}}{e^x + e^{-x}}$

ReLU :

ReLU(x) = max(0, x)

And in this case, I will pick ReLU for our activation function in this article.
So let's check the forward prop and backprop of it.

forward prop

a_{ij} = max(0, x_{ij})

backprop

\frac{\partial E}{\partial x_{ij}} = \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}}{\partial x_{ij}}
= \left\{
\begin{array}{ll}
\frac{\partial E}{\partial a^{(k)}_{ij}} & (a^{(k)}_{ij} \geq 0) \\
0 & (otherwise)
\end{array}
\right.

Max Pooling

Forward prop

a_{ij} = max(0, x_{(i+s)(j+t)})

where $s\in |0, l|$ and $t\in |0, l|$ and $l$ is filter size.

Backward prop

\frac{\partial E}{\partial x_{(i+s)(j+t)}} = \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}}{\partial x_{(i+s)(j+t)}}
= \left\{
\begin{array}{ll}
\frac{\partial E}{\partial a^{(k)}_{ij}} & (a^{(k)}_{ij} = x_{(i+s)(j+t)}) \\
0 & (otherwise)
\end{array}
\right.

Convolutional Layer(Multi-Channel)

So far, we haven't consider the multi-channel convolutional layers.
But since we have done to cover the single channel connections, it's good time for us to move on to more practical nets.

This is the conceptual image of multi-channel convolutional layers.

Forward-prop\\
a^{(k)}_{ij} = \sum_c a^{(k, c)}_{ij} = \sum_c \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} W^{(k)}_{st} x_{(i+s)(j+t)} + b^{(k)}\\

Likewise in Forward-prop, Backward-Prop just requires $\sum_c$ in its math.
** $c$ is index of channel.

Updating Parameter(W) and Backprop to previous layer in Multi-channel

Multi-channel convolutional layer has two aspects as well, one for updating its weights and other for propagating the error to previous layers.

updating parameters

\frac{\partial E}{\partial W^{(k, c)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} x_{(i+s)(j+t)}\\

\frac{\partial E}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}}

backprop to previous layer

\frac{\partial E}{\partial x^c_{ij}} = \sum_k \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} \frac{\partial a^{(k)}_{(i-s)(j-t)}}{\partial x^c_{ij}}
= \sum_k \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} W^{(k, c)}_{st}

Advanced Material

htps : // 아 rぃ v. rg / pdf / 1603. 07285. pdf
htps : // 기주 b. 코 m / v 즈마 ぃ

Reference

이 문제에 관하여([Review] Math of CNN), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/Rowing0914/items/e815ca24427874030526

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

Trilinear interpolation

Google Blockly를 사용하여 원주율 구하기

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다