Affine 레이어의 역방향 전파 측정
13411 단어 DeepLearning기계 학습
0. 배경
\begin{align}
\frac{\partial L}{\partial \boldsymbol{X}} &= \frac{\partial L}{\partial \boldsymbol{Y}}\cdot \boldsymbol{W}^T \\
\frac{\partial L}{\partial \boldsymbol{W}} &=
\boldsymbol{X}^T \cdot \frac{\partial L}{\partial \boldsymbol{Y}}
\end{align}
1. 저차원에서는 활성화 함수를 고려하지 않고 성분 계산을 착실히 한다
$\boldsymbol {x}= (x_1\;x_2) $의 2D 입력을 가정합니다.
이 입력에 대해 레이어 1의 출력을 3개로 설정하려면 오른쪽부터 $(2,3)$행렬을 곱합니다.
\boldsymbol{W} = \begin{pmatrix}
w_{11} & w_{21} & w_{31} \\
w_{12} & w_{22} & w_{32}
\end{pmatrix}
$\boldsymbol{Y}$을(를) 내보냅니다.\begin{align}
\boldsymbol{Y} &= \boldsymbol{X} \cdot \boldsymbol{W} \\
&=
\begin{pmatrix}
x_1 & x_2
\end{pmatrix}
\begin{pmatrix}
w_{11} & w_{21} & w_{31} \\
w_{12} & w_{22} & w_{32}
\end{pmatrix} \\
&=
\begin{pmatrix}
w_{11}x_1+w_{12}x_2 & w_{21}x_1+w_{22}x_2 & w_{31}x_1+w_{32}x_2
\end{pmatrix} \\
&=
\begin{pmatrix}
y_1 & y_2 & y_3
\end{pmatrix} \tag{1.1}
\end{align}
하계.손실 함수 $L$의 입력 $\boldsymbol {X}$의 편미분은 $x_1, x_2$는 $y_와 같다1, y_2, y_3달러면
\begin{align}
\frac{\partial L}{\partial \boldsymbol{X}} &=
\begin{pmatrix}
\frac{\partial L}{\partial x_1} & \frac{\partial L}{\partial x_2}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial L}{\partial \boldsymbol{Y}} \cdot \frac{\partial \boldsymbol{Y}}{\partial x_1} & \frac{\partial L}{\partial \boldsymbol{Y}} \cdot \frac{\partial \boldsymbol{Y}}{\partial x_2}
\end{pmatrix}
\end{align}
여기 있다\begin{align}
\frac{\partial L}{\partial \boldsymbol{Y}} \cdot \frac{\partial \boldsymbol{Y}}{\partial x_1} =
\begin{pmatrix}
\frac{\partial L}{\partial y_1} & \frac{\partial L}{\partial y_2} & \frac{\partial L}{\partial y_3}
\end{pmatrix}
\cdot
\begin{pmatrix}
\frac{\partial y_1}{\partial x_1} \\
\frac{\partial y_2}{\partial x_1} \\
\frac{\partial y_3}{\partial x_1}
\end{pmatrix}
\end{align}
그래서\begin{align}
\frac{\partial L}{\partial \boldsymbol{X}} &=
\begin{pmatrix}
\frac{\partial L}{\partial y_1} \frac{\partial y_1}{\partial x_1} +
\frac{\partial L}{\partial y_2} \frac{\partial y_2}{\partial x_1} +
\frac{\partial L}{\partial y_3} \frac{\partial y_3}{\partial x_1} &
\frac{\partial L}{\partial y_1} \frac{\partial y_1}{\partial x_2} +
\frac{\partial L}{\partial y_2} \frac{\partial y_2}{\partial x_2} +
\frac{\partial L}{\partial y_3} \frac{\partial y_3}{\partial x_2}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial L}{\partial y_1} w_{11} +
\frac{\partial L}{\partial y_2} w_{21} +
\frac{\partial L}{\partial y_3} w_{31} &
\frac{\partial L}{\partial y_1} w_{12} +
\frac{\partial L}{\partial y_2} w_{22} +
\frac{\partial L}{\partial y_3} w_{32}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial L}{\partial y_1} & \frac{\partial L}{\partial y_2} & \frac{\partial L}{\partial y_3}
\end{pmatrix}
\begin{pmatrix}
w_{11} & w_{12} \\
w_{21} & w_{22} \\
w_{31} & w_{32}
\end{pmatrix} \\
&= \frac{\partial L}{\partial \boldsymbol{Y}}\cdot \boldsymbol{W}^T
\end{align}
다른 한편, 손실 함수 $L$의 권중 $\boldsymbol {W}$의 편미분\begin{align}
\frac{\partial L}{\partial \boldsymbol{W}} &=
\begin{pmatrix}
\frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{31}} \\
\frac{\partial L}{\partial w_{12}} & \frac{\partial L}{\partial w_{22}} & \frac{\partial L}{\partial w_{32}}
\end{pmatrix} \\
\end{align}
태그 요소의 표시 속성을 수정합니다.여기, $(1.1)$표현식에서 $w_{11}$는 $y_와 같습니다.$w_만{12}$는 $y_와 같습니다.1$만...$w_{31}$는 $y_와 같습니다.3달러에 불과, $w_{32}$는 $y_와 같습니다.하면, 만약, 만약...
\begin{align}
\frac{\partial L}{\partial w_{11}} &= \frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_{11}} \\
\frac{\partial L}{\partial w_{12}} &= \frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_{12}} \\
\frac{\partial L}{\partial w_{21}} &= \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_{21}} \\
\frac{\partial L}{\partial w_{22}} &= \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_{22}} \\
\frac{\partial L}{\partial w_{31}} &= \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_{31}} \\
\frac{\partial L}{\partial w_{32}} &= \frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_{32}}
\end{align}
하계.따라서 $\frac{\partial L}{\partial\boldsymbol{W}$
\begin{align}
\frac{\partial L}{\partial \boldsymbol{W}} &=
\begin{pmatrix}
\frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{31}} \\
\frac{\partial L}{\partial w_{12}} & \frac{\partial L}{\partial w_{22}} & \frac{\partial L}{\partial w_{32}}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_{11}} &
\frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_{21}} &
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_{31}} \\
\frac{\partial L}{\partial y_1}\frac{\partial y_1}{\partial w_{12}} & \frac{\partial L}{\partial y_2}\frac{\partial y_2}{\partial w_{22}} &
\frac{\partial L}{\partial y_3}\frac{\partial y_3}{\partial w_{32}}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial L}{\partial y_1}x_1 &
\frac{\partial L}{\partial y_2}x_1 &
\frac{\partial L}{\partial y_3}x_1 \\
\frac{\partial L}{\partial y_1}x_2 &
\frac{\partial L}{\partial y_2}x_2 &
\frac{\partial L}{\partial y_3}x_2
\end{pmatrix} \\
&= \begin{pmatrix}
x_1 \\
x_2
\end{pmatrix}
\begin{pmatrix}
\frac{\partial L}{\partial y_1} &
\frac{\partial L}{\partial y_2} &
\frac{\partial L}{\partial y_3}
\end{pmatrix} \\
&= \boldsymbol{X}^T \cdot \frac{\partial L}{\partial \boldsymbol{Y}}
\end{align}
지금\begin{align}
\frac{\partial L}{\partial \boldsymbol{W}} &=
\boldsymbol{X}^T \cdot \frac{\partial L}{\partial \boldsymbol{Y}}
\end{align}
값을 구할 때 서피스 법선의 원래 방향을 사용합니다.2. 활성화 함수와 2층 고려
1 각 레이어의 활성화 함수를 무시하거나 두 번째 레이어나 더 높은 버전을 고려하지 않습니다.여기서 나는 2층도 가져오고 활성화 함수도 가져오고 싶다.
레이어 1 활성화 함수를 $h$로 설정하고 레이어 2 (출력 레이어) 활성화 함수를 $\sigma$로 설정합니다.
첫 번째로 본 결과는 이렇다.
L = \sigma(h(\boldsymbol{X} \cdot \boldsymbol{W}) \cdot \boldsymbol{W}^{(2)})
하나하나 보다.$\boldsymbol {X}$를 입력합니다.
\boldsymbol{X} = (x_1\; x_2)
첫 번째 레이어 입력\boldsymbol{Y} = \boldsymbol{X}\cdot \boldsymbol{W}
첫 번째 레이어 출력(활성화 함수)\begin{align}
h(\boldsymbol{Y}) &= h(\boldsymbol{X}\cdot \boldsymbol{W}) \\
&= \begin{pmatrix}
h(y_1) & h(y_2) & h(y_3)
\end{pmatrix}
\end{align}
레이어 2 입력(출력 레이어)\begin{align}
z &= h(\boldsymbol{Y})\cdot \boldsymbol{W}^{(2)} \\
&=
\begin{pmatrix}
h(y_1) & h(y_2) & h(y_3)
\end{pmatrix}
\begin{pmatrix}
w_1^{(2)} \\
w_2^{(2)} \\
w_3^{(2)}
\end{pmatrix} \\
&= w_1^{(2)}h(y_1) + w_2^{(2)}h(y_2) + w_3^{(2)}h(y_3)
\end{align}
레이어 2(출력 레이어) 출력\begin{align}
L &= \sigma (z) \\
&= \sigma (w_1^{(2)}h(y_1) + w_2^{(2)}h(y_2) + w_3^{(2)}h(y_3))
\end{align}
$\boldsymbol {X}$와 $\boldsymbol {W}$의 편미분은 1과 같지만 다시 열거합니다.\begin{align}
\frac{\partial \sigma}{\partial \boldsymbol{X}} &=
\begin{pmatrix}
\frac{\partial \sigma}{\partial x_1} & \frac{\partial \sigma}{\partial x_2}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial \sigma}{\partial \boldsymbol{Y}} \cdot \frac{\partial \boldsymbol{Y}}{\partial x_1} & \frac{\partial \sigma}{\partial \boldsymbol{Y}} \cdot \frac{\partial \boldsymbol{Y}}{\partial x_2}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial \sigma}{\partial y_1} \frac{\partial y_1}{\partial x_1} +
\frac{\partial \sigma}{\partial y_2} \frac{\partial y_2}{\partial x_1} +
\frac{\partial \sigma}{\partial y_3} \frac{\partial y_3}{\partial x_1} &
\frac{\partial \sigma}{\partial y_1} \frac{\partial y_1}{\partial x_2} +
\frac{\partial \sigma}{\partial y_2} \frac{\partial y_2}{\partial x_2} +
\frac{\partial \sigma}{\partial y_3} \frac{\partial y_3}{\partial x_2}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial \sigma}{\partial y_1} w_{11} +
\frac{\partial \sigma}{\partial y_2} w_{21} +
\frac{\partial \sigma}{\partial y_3} w_{31} &
\frac{\partial \sigma}{\partial y_1} w_{12} +
\frac{\partial \sigma}{\partial y_2} w_{22} +
\frac{\partial \sigma}{\partial y_3} w_{32}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial \sigma}{\partial y_1} & \frac{\partial \sigma}{\partial y_2} & \frac{\partial \sigma}{\partial y_3}
\end{pmatrix}
\begin{pmatrix}
w_{11} & w_{12} \\
w_{21} & w_{22} \\
w_{31} & w_{32}
\end{pmatrix} \\
&= \frac{\partial \sigma}{\partial \boldsymbol{Y}}\cdot \boldsymbol{W}^T \\
\frac{\partial L}{\partial \boldsymbol{W}} &=
\begin{pmatrix}
\frac{\partial \sigma}{\partial w_{11}} & \frac{\partial \sigma}{\partial w_{21}} & \frac{\partial \sigma}{\partial w_{31}} \\
\frac{\partial \sigma}{\partial w_{12}} & \frac{\partial \sigma}{\partial w_{22}} & \frac{\partial \sigma}{\partial w_{32}}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial \sigma}{\partial y_1}\frac{\partial y_1}{\partial w_{11}} &
\frac{\partial \sigma}{\partial y_2}\frac{\partial y_2}{\partial w_{21}} &
\frac{\partial \sigma}{\partial y_3}\frac{\partial y_3}{\partial w_{31}} \\
\frac{\partial \sigma}{\partial y_1}\frac{\partial y_1}{\partial w_{12}} &
\frac{\partial \sigma}{\partial y_2}\frac{\partial y_2}{\partial w_{22}} &
\frac{\partial \sigma}{\partial y_3}\frac{\partial y_3}{\partial w_{32}}
\end{pmatrix} \\
&=
\begin{pmatrix}
\frac{\partial \sigma}{\partial y_1}x_1 &
\frac{\partial \sigma}{\partial y_2}x_1 &
\frac{\partial \sigma}{\partial y_3}x_1 \\
\frac{\partial \sigma}{\partial y_1}x_2 &
\frac{\partial \sigma}{\partial y_2}x_2 &
\frac{\partial \sigma}{\partial y_3}x_2
\end{pmatrix} \\
&=
\begin{pmatrix}
x_1 \\
x_2
\end{pmatrix}
\begin{pmatrix}
\frac{\partial \sigma}{\partial y_1} &
\frac{\partial \sigma}{\partial y_2} &
\frac{\partial \sigma}{\partial y_3}
\end{pmatrix} \\
&= \boldsymbol{X}^T \cdot \frac{\partial \sigma}{\partial \boldsymbol{Y}}
\end{align}
뱀발이지만 성분이 나타나면\frac{\partial L}{\partial w_{ji}} =
\frac{\partial L}{\partial y_j} \frac{\partial y_j}{\partial w_{ji}}
하계.3. 더 일반적인 방식으로 만들기
1, 2에서도 층수가 제한되어 있고 각 층의 차원도 매우 적지만, 더욱 일반적으로, 나는 $i$층, $j$층, $k$층에 주목하고 싶다.
여기서 행렬은 분량으로 표시한다.
\begin{align}
a_j^{(j)} &= \sum_i w_{ji}^{(j)}z_i^{(i)} \\
z_j^{(j)} &= h(a_j^{(j)})
\end{align}
가중치 $w_{ji}^(j)}$에 기반하여 $w_{ji}^(j)}$a_j^ (j)}$에만 나타나기 때문에\begin{align}
\frac{\partial L}{\partial w_{ji}^{(j)}} &= \frac{\partial L}{\partial a_j^{(j)}}\frac{\partial a_j^{(j)}}{\partial w_{ji}^{(j)}} \\
&= \frac{\partial L}{\partial a_j^{(j)}}\frac{\partial}{\partial w_{ji}^{(j)}} ( \sum_i w_{ji}^{(j)}z_i^{(i)} ) \\
&= \frac{\partial L}{\partial a_j^{(j)}}z_i
\end{align}
또한 $j$개 입력 $a_j^ (j)}$에 기반하여 $a_j^ (j)}$는 $a_와 같습니다.k^ (k)}$의 변화로만 오차 함수를 바꿀 수 있기 때문에\begin{align}
\frac{\partial L}{\partial a_j^{(j)}} &=
\sum_k \frac{\partial L}{\partial a_k^{(k)}}\frac{\partial a_k^{(k)}}{\partial a_j^{(j)}} \\
&= \sum_k \frac{\partial L}{\partial a_k^{(k)}}\frac{\partial}{\partial a_j^{(j)}} ( \sum_j w_{kj}^{(k)}z_j^{(j)} ) \\
&= \sum_k \frac{\partial L}{\partial a_k^{(k)}} w_{kj} \frac{\partial h(a_j^{(j)})}{\partial a_j^{(j)}} \\
&= \frac{\partial h(a_j^{(j)})}{\partial a_j^{(j)}}\sum_k w_{kj} \frac{\partial L}{\partial a_k^{(k)}}
\end{align}
되다참고 자료
Reference
이 문제에 관하여(Affine 레이어의 역방향 전파 측정), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/yuyasat/items/d9cdd4401221df5375b6텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)