딥러닝 조지기 - CS231n 스터디 (Lecture 5)

스터디 가기 전 최후의 복습! 온도 조절이 안되는 내 미라클 모닝 친구 상현이와 함께 한다.
햇빛은 맑고 쨍쨍한데 우리의 피부는 창백하고 눈밑은 검네...
건강 챙기려고 미라클 모닝 시작했는데... 어이어이~ ....

Lecture 5 - Convolutional Neural Networks

(1) History of Neural Networks

1.

2.

3. Backpropogation (1986)

4. Reinvigorated research in Deep Learning (2006)

5. First Strong Results of NNs (2012)

  • Speech Recognition, Image Recognition, ... introduced first convolutional NNs and dramatically reduced errors & loss

(2) History of Image Processing

1. Hubel & Wiesel

Topographical mapping in the cortex : nearby cells in cortex represent nearby regions in the visual field

Discovery of Hierarchical Organization :

2. Neocognitron - Fukushima (1980)

first example of network architectrue/model that had idea of simple and complex cells (Hubel & Wiesle).

Alternating layers of simple and complex cells - simple (modifiable parameters) and complex on top (pooling - invariant to different minor modifications from simple cells)

3. Gradient-based learning applied to Document Recognition - LeCun et al., (1998)

applying of backpropogation and gradient-based learning to train convolutional NNs, which did well in recognizing documents especially zip codes (actually used in postal services!)

but complex data는 아직 ㅠㅠ

4. "Alexnet" ImageNet Classification with Deep Concolutional NNs (2012)

5. Today

  • -ConvNets used everywhere ! Detecting images, and segmentations
    (labeling every pixel and outlining)

ex) face-recognition, video classification, pose recognition (joints, ...), street sign recognition, aerial maps (segmenting streets, buildings), image captioning (writing sentence description of image), artwork created by NN

(3) Convolutional Layer

1. Fully Connected Layer

input - stretched out 32x32 image to 3072x1 vector
weight matrix - 10x3072 weights
activation / output layer - 1x10

2. Convolutional Layer

1. instead of stretching out the image to a one long vector, we keep the dimensions.

이게 뭐가 더 나은거지? 구조적으로만 다른거 아닌가?

  • 효율적

  • input - dimensions are kept , 32x32x3 image (RGB여서 그런가?)
  • filter - filters always extend the full depth of input volume, 5x5x3

2. Dot production of this filter, and a chunk of image

  • first - overlay the filter on top of a spatial location of image,

  • second - do the dot product - multoplication of each element of filter with each corresponding element on spatial location of image

  • number of multiplications : 5 x 5 x 3

  • (W transpose * X) + bias

    		Questions

    1. when we do the dot product, do we turn the 5x5x3 to vector?
    Yes, you can think of it as plopping the filter on and doing the element-wise multiplication at each location. But stretching out the filter and stretching out the input volume will give you same results.

      2. any intuition for why this is a W transpose? 
      No intuition. Just a notation to make the math work as a dot product - 1D vector
      
      3. is W not 5x5x3 but 75x1? 
      Yes, stretching out is needed before dot product multiplication - 
      

3. Overlaying filter on top of image - convolve (slide) over all spatial locations

  • start from upper-lef corner and center the fulter on top of every pixel in this input volume
  • each filter is looking for a certain type of template or concept in the input volume

이 필터가 activation function 같은건가? 아니면 그냥 linear layer? Layer은 아니다. 하나의 Layer에 multiple filters 라고 함.
그렇다면 하나의 classification 을 위한 filter? eX) 이미지가 말일 확률에 대한 필터, 이미지가 차일 확률에 대한 필터 ...
activation map 이 28x28x1 인 이유는, filter은 삐져나오지 않게 반복해서 정렬하는것이기 때문에 32-5+1 ? 인건가? 그래도 됨? 각각의 element마다 weight 이 있는데, 어떤 element들은 덜 들어가는거 아닌가? 내가 잘못 이해했나...ㅋ

ㅇㅇ 이게 맞음

do this for multiple filters!

  • number of activation maps = number of filters !

4. Preview of Convnet

Example of Activation Maps

  • top : row of 5x5 filter
  • filter with red box - oriented edge template. Sliding over a image, white value (high value) on location of edges (where this template is more present in image)
  • filter slides over image, computes dot product at each location

Example of car image processing

3. Spatial Dimensions - Closer Look of Sliding

7x7 image with 3x3 filter

stride n - 대충 n 칸씩 간단 말

stride 1 : 5x5 output size
stride 2 : 3x3 output size
stride 3 : doesn't fit! asymmetrical outputs - not all designs are possible

Output Sizes

Zero Padding

if you pad your pixels - 테두리 그려서 output 맞추기! 내가 아까 궁금해하던거!

Maintaining output size, applying filters to edges

After padding, 9x9가 되고,
여기에 3x3을 끼우면, 7x7 output 나옴.

	Question
    1. What's the actual number of outputs?
    In this case, 7x7x (number of filters) 
    
    depth 어디 갔지ㅣ.. 
    
    2. how does this connect to input with depth?
    this example은 보기 쉬우라고 2D로 한거고, depth 도 곱하면 되세요~
    3. other ways thatn zero-padding exists
    
    4. non-square images?
    
    
  • size shrinking too fast means you are losing information - this is not good :(. 화질 저화 느낌.
    이걸 해결하는게 padding? 크기 축소 보호막.
    크기축소보호막이 의미가 아ㅣㄴㅁ
  • padding size? whatever fits your model/image size&filter size
  • with padding - filter can be bigger than image
  • activation map이 image보다 커져도 됨?ㅇㅇ 오차피 pooling 하면 됨!!!!

Calculating Output Volume Size

Size : 32x32(zero-padding)x10(filters)

Number of Parameters : ((5x5x3) +1) x 10 = 760
(5x5x3 weights + 1 biases ) x 10 filters

1 x 1 Convolution Layes? YES!

픽셀 여러개에서 특성 가져가는건데 1x1 의미? - 2차원에서는 의미가 없음.
근데 의미가 있는 것은 'depth' 에서 비롯되는 3차원에서!
네모네모는 figure

(4) ConvNet

Pooling = downsampling, 특징적인 요소를 가지고 크기를 줄임 (max pool)

  • 1개만
  • pooling filter
  • 작아지지만, 유의미한 정보를 효율적으로 담느느다ㅏ치자ㅏㅏ~~

convolutional NN 장점 - 2D to 2D, not 1D - 맥락을 보여준다

6강 전에 생각해볼것?
왜 activization function을 사용하는지?

좋은 웹페이지 즐겨찾기