[Review] Math of CNN

7089 단어 mathCNNDeepLearning

Preface



In this article, I would like to elaborate the arithmetic in CNN.
But I may make mistake so please feel free to leave the comment below.

Implementation of CNN in python with Numpy



Visual Image of Networks



source : h tp // w w. md 피. 이 m/1099-4300/19/6/242


Math in CNN



In this section, I will write about the mathematical concept of Convolutional networks and its three features. As you probably know, the network composes many neurons in layers. And each layer is connected by its pipeline. So first the input data is propaga layers till the output layer. Then in order to strengthen the ability to represent the dataset, the net needs to learn by tuning its parameters measuring the error between the target and its prediction.
Hence in this section we will see two faces of the layers, forward-propagation and back-propagation.

Three Key Features in CNN


  • Convolutional layer
  • Activation layer
  • Pooling layer(generally we apply max-pooling)

  • So let me describe one by one.

    1. Convolutional layer.




    Let's consider a single image case. For simplicity, I would like to define the image size and its convoluted image size as above.
    So notation here is like below.
    $x$ : input
    $a^k$ : after convoluted image
    $k$ : index of kernel(weight filter)
    $W$ : kernel(weight filter)
    $b$ : bias
    $E$ : Cost Function

    forward prop


    a^{(k)}_{ij} = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} W^{(k)}_{st} x_{(i+s)(j+t)} + b^{(k)}
    

    back prop to update the weight


    \frac{\partial E}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} x_{(i+s)(j+t)}\\
    
    \frac{\partial E}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}}
    

    Bear in mind that the propagated error can be noted like below.
    \delta^{k}_{ij} = \frac{\partial E}{\partial a^{(k)}_{ij}}
    

    back prop to previous layer


    \frac{\partial E}{\partial x_{ij}} = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} \frac{\partial a^{(k)}_{(i-s)(j-t)}}{\partial x_{ij}}
    = \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} W^{(k)}_{st}
    

    2. Activation layer



    When it comes to the selection of activation functions, indeed we have some options, for example sigmoid or hyperbolic tangent. So in this section, first let me brief the appearance of each function, then move on to see propagations.

    Activation Families


  • sigmoid : $\sigma(x) =\frac{1}{1 + e^{-x}}$
  • tanh : $tanh(x) =\frac{e^x - e^{-x}}{e^x + e^{-x}}$
  • ReLU :
  • ReLU(x) = max(0, x)
    

    And in this case, I will pick ReLU for our activation function in this article.
    So let's check the forward prop and backprop of it.

    forward prop


    a_{ij} = max(0, x_{ij})
    

    backprop


    \frac{\partial E}{\partial x_{ij}} = \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}}{\partial x_{ij}}
    = \left\{
    \begin{array}{ll}
    \frac{\partial E}{\partial a^{(k)}_{ij}} & (a^{(k)}_{ij} \geq 0) \\
    0 & (otherwise)
    \end{array}
    \right.
    

    Max Pooling



    Forward prop


    a_{ij} = max(0, x_{(i+s)(j+t)})
    

    where $s\in |0, l|$ and $t\in |0, l|$ and $l$ is filter size.

    Backward prop


    \frac{\partial E}{\partial x_{(i+s)(j+t)}} = \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}}{\partial x_{(i+s)(j+t)}}
    = \left\{
    \begin{array}{ll}
    \frac{\partial E}{\partial a^{(k)}_{ij}} & (a^{(k)}_{ij} = x_{(i+s)(j+t)}) \\
    0 & (otherwise)
    \end{array}
    \right.
    

    Convolutional Layer(Multi-Channel)



    So far, we haven't consider the multi-channel convolutional layers.
    But since we have done to cover the single channel connections, it's good time for us to move on to more practical nets.

    This is the conceptual image of multi-channel convolutional layers.

    Forward-prop\\
    a^{(k)}_{ij} = \sum_c a^{(k, c)}_{ij} = \sum_c \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} W^{(k)}_{st} x_{(i+s)(j+t)} + b^{(k)}\\
    

    Likewise in Forward-prop, Backward-Prop just requires $\sum_c$ in its math.
    ** $c$ is index of channel.

    Updating Parameter(W) and Backprop to previous layer in Multi-channel



    Multi-channel convolutional layer has two aspects as well, one for updating its weights and other for propagating the error to previous layers.

    updating parameters


    \frac{\partial E}{\partial W^{(k, c)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial W^{(k)}_{st}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} x_{(i+s)(j+t)}\\
    
    \frac{\partial E}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}} \frac{\partial a^{(k)}_{ij}}{\partial b^{(k)}} = \sum^{M-m}_{i=0} \sum^{N-n}_{j=0} \frac{\partial E}{\partial a^{(k)}_{ij}}
    

    backprop to previous layer


    \frac{\partial E}{\partial x^c_{ij}} = \sum_k \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} \frac{\partial a^{(k)}_{(i-s)(j-t)}}{\partial x^c_{ij}}
    = \sum_k \sum^{m-1}_{s=0} \sum^{n-1}_{t=0} \frac{\partial E}{\partial a^{(k)}_{(i-s)(j-t)}} W^{(k, c)}_{st}
    

    Advanced Material



    htps : // 아 rぃ v. rg / pdf / 1603. 07285. pdf
    htps : // 기주 b. 코 m / v 즈마 ぃ

    좋은 웹페이지 즐겨찾기