[Review] Math of CNN
7089 단어 mathCNNDeepLearning
Preface
In this article, I would like to elaborate the arithmetic in CNN.
But I may make mistake so please feel free to leave the comment below.
Implementation of CNN in python with Numpy
Visual Image of Networks
source : h tp // w w. md 피. 이 m/1099-4300/19/6/242
Math in CNN
In this section, I will write about the mathematical concept of Convolutional networks and its three features. As you probably know, the network composes many neurons in layers. And each layer is connected by its pipeline. So first the input data is propaga layers till the output layer. Then in order to strengthen the ability to represent the dataset, the net needs to learn by tuning its parameters measuring the error between the target and its prediction.
Hence in this section we will see two faces of the layers, forward-propagation and back-propagation.
Three Key Features in CNN
So let me describe one by one.
1. Convolutional layer.
Let's consider a single image case. For simplicity, I would like to define the image size and its convoluted image size as above.
So notation here is like below.
$x$ : input
$a^k$ : after convoluted image
$k$ : index of kernel(weight filter)
$W$ : kernel(weight filter)
$b$ : bias
$E$ : Cost Function
forward prop
a^{(k)}_{ij} = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} W^{(k)}_{st} x_{(i+s)(j+t)} + b^{(k)}
back prop to update the weight
\frac{\partial E}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} x_{(i+s)(j+t)}\\
\frac{\partial E}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}}
Bear in mind that the propagated error can be noted like below.
\delta^{k}_{ij} = \frac{\partial E}{\partial a^{(k)}_{ij}}
back prop to previous layer
\frac{\partial E}{\partial x_{ij}} = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} \frac{\partial a^{(k)}_{(i-s)(j-t)}}{\partial x_{ij}}
= \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} W^{(k)}_{st}
2. Activation layer
When it comes to the selection of activation functions, indeed we have some options, for example sigmoid or hyperbolic tangent. So in this section, first let me brief the appearance of each function, then move on to see propagations.
Activation Families
ReLU(x) = max(0, x)
And in this case, I will pick ReLU for our activation function in this article.
So let's check the forward prop and backprop of it.
forward prop
a_{ij} = max(0, x_{ij})
backprop
\frac{\partial E}{\partial x_{ij}} = \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}}{\partial x_{ij}}
= \left\{
\begin{array}{ll}
\frac{\partial E}{\partial a^{(k)}_{ij}} & (a^{(k)}_{ij} \geq 0) \\
0 & (otherwise)
\end{array}
\right.
Max Pooling
Forward prop
a_{ij} = max(0, x_{(i+s)(j+t)})
where $s\in |0, l|$ and $t\in |0, l|$ and $l$ is filter size.
Backward prop
\frac{\partial E}{\partial x_{(i+s)(j+t)}} = \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}}{\partial x_{(i+s)(j+t)}}
= \left\{
\begin{array}{ll}
\frac{\partial E}{\partial a^{(k)}_{ij}} & (a^{(k)}_{ij} = x_{(i+s)(j+t)}) \\
0 & (otherwise)
\end{array}
\right.
Convolutional Layer(Multi-Channel)
So far, we haven't consider the multi-channel convolutional layers.
But since we have done to cover the single channel connections, it's good time for us to move on to more practical nets.
This is the conceptual image of multi-channel convolutional layers.
Forward-prop\\
a^{(k)}_{ij} = \sum_c a^{(k, c)}_{ij} = \sum_c \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} W^{(k)}_{st} x_{(i+s)(j+t)} + b^{(k)}\\
Likewise in Forward-prop, Backward-Prop just requires $\sum_c$ in its math.
** $c$ is index of channel.
Updating Parameter(W) and Backprop to previous layer in Multi-channel
Multi-channel convolutional layer has two aspects as well, one for updating its weights and other for propagating the error to previous layers.
updating parameters
\frac{\partial E}{\partial W^{(k, c)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} x_{(i+s)(j+t)}\\
\frac{\partial E}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}}
backprop to previous layer
\frac{\partial E}{\partial x^c_{ij}} = \sum_k \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} \frac{\partial a^{(k)}_{(i-s)(j-t)}}{\partial x^c_{ij}}
= \sum_k \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} W^{(k, c)}_{st}
Advanced Material
htps : // 아 rぃ v. rg / pdf / 1603. 07285. pdf
htps : // 기주 b. 코 m / v 즈마 ぃ
Reference
이 문제에 관하여([Review] Math of CNN), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/Rowing0914/items/e815ca24427874030526텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)