transfer learning (3)

이제 앞의 내용을 바탕으로 transfer learning을 해보자.

baseline은 vgg16을 한다.

1. load vgg16 and pre-trained parameters

load model and check

vgg16 = torchvision.models.vgg16(pretrained = True)
vgg16
---------------------------------------------------------------------------
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

check pretrained parameters

vgg16.state_dict().keys()
------------------------------------------------------------------------------
odict_keys(['features.0.weight', 'features.0.bias', 'features.2.weight', 'features.2.bias', 'features.5.weight', 'features.5.bias', 'features.7.weight', 'features.7.bias', 'features.10.weight', 'features.10.bias', 'features.12.weight', 'features.12.bias', 'features.14.weight', 'features.14.bias', 'features.17.weight', 'features.17.bias', 'features.19.weight', 'features.19.bias', 'features.21.weight', 'features.21.bias', 'features.24.weight', 'features.24.bias', 'features.26.weight', 'features.26.bias', 'features.28.weight', 'features.28.bias', 'classifier.0.weight', 'classifier.0.bias', 'classifier.3.weight', 'classifier.3.bias', 'classifier.6.weight', 'classifier.6.bias'])
vgg16.state_dict()['features.0.weight']
-----------------------------------------------------------------------------
tensor([[[[-5.5373e-01,  1.4270e-01,  5.2896e-01],
          [-5.8312e-01,  3.5655e-01,  7.6566e-01],
          [-6.9022e-01, -4.8019e-02,  4.8409e-01]],
          ....................................................................

2. modify (add) model architecture

conv는 freeze시킬 예정이므로 fc쪽을 쫌 늘려준다.
fc만 확인해보기.

vgg16.classifier
------------------------------------------------------------------------------
Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
)

이걸 변경 할꺼다.
이때 conv쪽에서 output의 size가 얼마로 나오냐에 따라 첫번째 Linear의 input의 size가 결정된다.
따라서 원하는 input을 conv쪽만 통과 시켜보자.

vgg16.features(train_input).shape
torch.Size([32, 512, 7, 7])

따라서 512 × 7 × 7으로 해주면 된다.
이때 원래는 vgg16을 보면 conv와 fc 사이에 AdaptiveAvgPool2d(output_size=(7, 7))가 있는데 사용하지 않을 꺼다.
따라서 변경 및 추가,삭제 해주자.
일단 conv만 사용할꺼기 때문에 뒤에를 모두 삭제하던가 or conv만 가지고 오면 된다.
삭제하는 경우

del(vgg16.avgpool)
del(vgg16.classifier)

feature만 따로 가지고 오는 경우

only_conv = vgg16.features

둘다 pretrained parameter가 유지 된다. 만약에 안된다면 apply를 이용하면 된다.
다만 둘의 약간의 차이가 있다.

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
==========================================================================================
Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU(inplace=True)

순서대로 앞부분만 가지고 온것인데 feature라는 위상이 유지되냐 안되냐의 차이가 있을 뿐이다.
google에 대부분이 두번째로 하므로 나는 첫번째로 해보기로 한다. 참고로 이미 feature라고 있어서 좋을거 같다. (feature라고 하는 이유는 저 부분이 특징추출의 역할을 하기 때문이다.)
flatten 추가

class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)
vgg16.add_module('flatten', Flatten())

classifier 추가

classifier = nn.Sequential(nn.Linear(in_features = 32*512*7*7, out_features = 512*7, bias = True),
                           nn.ReLU(),
                           nn.Dropout(p = 0.5),
                           nn.Linear(in_features = 512*7, out_features = 512, bias = True),
                            nn.ReLU(),
                            nn.Dropout(p = 0.5),
                            nn.Linear(in_features = 512, out_features = 100, bias = True)
                        )
vgg16.add_module('classifier', classifier)

전체 model

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (flatten): Flatten()
  (classifier): Sequential(
    (0): Linear(in_features=802816, out_features=3584, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=3584, out_features=512, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=512, out_features=100, bias=True)
  )

그리고 vgg16(train_input)했더니 오류가 나왔다.

AttributeError: 'VGG' object has no attribute 'avgpool'
---------------------------------------------------------------------------
~/.conda/envs/hyeokjong2/lib/python3.7/site-packages/torchvision/models/vgg.py in forward(self, x)
     48     def forward(self, x: torch.Tensor) -> torch.Tensor:
     49         x = self.features(x)
---> 50         x = self.avgpool(x)
     51         x = torch.flatten(x, 1)
     52         x = self.classifier(x)

이부분인데 보니까 vgg16을 그대로 사용하면 생기는 문제인듯 하다. 원래 VGG16에서 forward로 저렇게 coding되어있는데 구조만 강제로 바꿔서 그런듯 하다.
따라서 원하는 부분만 복사를 해서 새로운 model class를 만들어야 할거 같다.
그래서 모든 사람들이 이렇게 안하는 거 같다......ㅜ

앞의 conv부분만 복사하자.

vgg16 = torchvision.models.vgg16(pretrained = True)
vgg16_conv = vgg16.features

이제 model class를 만들자.

class vgg16__(nn.Module):
    def __init__(self, conv_seq, flatten, classifier):
        super(vgg16__, self).__init__()
        self.conv_seq = conv_seq
        self.flatten = flatten
        self.classifier = classifier
    def forward(self, x):
        x = self.conv_seq(x)
        x = self.flatten(x)
        x = self.classifier(x)
        return x
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)
classifier = nn.Sequential(nn.Linear(in_features = 512*7*7, out_features = 512*7, bias = True),
                           nn.ReLU(),
                           nn.Dropout(p = 0.5),
                           nn.Linear(in_features = 512*7, out_features = 512, bias = True),
                           nn.ReLU(),
                           nn.Dropout(p = 0.5),
                           nn.Linear(in_features = 512, out_features = 100, bias = True))
vgg16_new = vgg16__(vgg16_conv, Flatten(), classifier)
vgg16__(
  (conv_seq): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (flatten): Flatten()
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=3584, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=3584, out_features=512, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=512, out_features=100, bias=True)
  )
)

이렇게 보면 위에 오류났던거랑 같아 보인다.ㅜ

vgg16_new(train_input).shape
-----------------------------------------------------------------------
torch.Size([32, 100])

잘된다!
이제 구조를 확인해 보자.

import pytorch_model_summary
from torchinfo import summary
model = vgg16_new
x = train_input
print(pytorch_model_summary.summary(model, x, show_input=True))
print('!@#'*40)
summary(model, input_size = x.shape )

parameter가 너무 많다.

3. check autograd then turn off conv's

fine tuning을 하는데 딱 어디를 한다라고 정해져 있는 방법이 아니다.
나는 conv는 freeze 하고 linear쪽만 학습시켜 보려고 한다.

autograd를 확인해보자.

for name, parameter in model.named_parameters():
    print(name)
    print(parameter.requires_grad)

모두 True이다.
parameter도 확인해보자.

for name, parameter in model.named_parameters():
    print(name)
    print(parameter)

state_dict()을 해보 같다. 보니까 pre-trained parameter가 잘 들어와 있다.
이제 conv를 모두 freeze 시키고 확인해 보자.

for name, parameter in model.named_parameters():
    if name[:4] == 'conv':
        parameter.requires_grad = False
for name, parameter in model.named_parameters():
    print(name, parameter.requires_grad)  

이제 covn는 freeze 되었으므로 update가 안될 것이다.

좋은 웹페이지 즐겨찾기