[PyTorch] Transfer Learning
model.save()
1️⃣ 모델 architecture 저장 2️⃣ 파라미터 저장
0) 모델 파라미터 출력 : model.state_dict()
# 각 파라미터의 이름과 tensor 크기 출력
for param_tensor in model.state_dict():
print(param_tensor, model.state_dict()[param_tensor].size())
1) 모델 파라미터 저장 및 불러오기 : model.load_state_dict()
# 파라미터 저장
MODEL_PATH ="saved"
torch.save(model.state_dict(), os.path.join(MODEL_PATH, 'model.pt'))
# 저장된 파라미터 불러오기
new_model = TheModelClass()
new_model.load_state_dict(torch.load(os.path.join(MODEL_PATH, "model.pt")))
파라미터는 ordered dict 타입으로 저장되어 있음 (딕셔너리 형태로 불러오면 됨)
⇒ 위와 같이 딕셔너리로 보는것보다 torchsummary를 써서 보는게 제일 유용
from torchsummary import summary
summary(model, (3, 224, 224))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 16, 111, 111] 448
BatchNorm2d-2 [-1, 16, 111, 111] 32
ReLU-3 [-1, 16, 111, 111] 0
MaxPool2d-4 [-1, 16, 55, 55] 0
Conv2d-5 [-1, 32, 27, 27] 4,640
BatchNorm2d-6 [-1, 32, 27, 27] 64
ReLU-7 [-1, 32, 27, 27] 0
MaxPool2d-8 [-1, 32, 13, 13] 0
Conv2d-9 [-1, 64, 6, 6] 18,496
BatchNorm2d-10 [-1, 64, 6, 6] 128
ReLU-11 [-1, 64, 6, 6] 0
MaxPool2d-12 [-1, 64, 3, 3] 0
Dropout-13 [-1, 576] 0
Linear-14 [-1, 1000] 577,000
Linear-15 [-1, 1] 1,001
================================================================
Total params: 601,809
Trainable params: 601,809
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): **5.53 # 모델의 크기**
Params size (MB): 2.30
Estimated Total Size (MB): 8.40
----------------------------------------------------------------
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
2) 모델 아키텍처 + 파라미터 저장 및 불러오기 (pickle 형태로 저장) : torch.load()
new_model = TheModelClass()
new_model.load_state_dict(torch.load(os.path.join(MODEL_PATH, "model.pt")))
# 모델 아키텍처 + 파라미터 같이 저장
torch.save(model, os.path.join(MODEL_PATH, "model_pickle.pt"))
# 아키텍처 + 파라미터 불러오기
model = torch.load(os.path.join(MODEL_PATH, "model_pickle.pt"))
torch.load 자체로 객체가 생성되어 한문장으로 모델과 파라미터 둘다 불러올 수 있음
☝ 구글 코랩의 가상 컴퓨터에서 데이터를 다운받으면 매우 오래 걸림. 그래서 로컬에서 다운 받은 다음 구글 드라이브에 옮기고 내 드라이브에 있는 데이터를 그래그 앤 드롭으로 현재 작업하고 있는 위치로 옮기는 게 제일 빠름!checkpoint
- 학습의 중간결과 저장
- earlystopping 기법 사용할 때 이전 학습의 결과를 보기 위해 사용
- epoch + loss + metric 제목으로 저장
(코랩에서는 자주 끊기기 때문에 원활한 학습을 위해 매우 필요)
torch.save({
'epoch' : e,
'model_state_dict' : model.state_dict(),
'optimizer_state_dict' : optimizer.state_dict(),
'loss' : epoch_loss
},
f"saved/checkpoint_model_{e}_{epoch_loss/len(dataloader)}_{epoch_acc/len(dataloader)}.pt")
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
- epoch
- 모델 파라미터
model.state_dict()
- optimizer 파라미터
optimizer.state_dict()
- loss
⇒ 위의 4가지 요소를 딕셔너리 형태로 저장하여 각 요소를 다시 불러오기
(추가) 데이터를 불러올 때 PIL로 불러와지지 않는 파일은 삭제한다.
from PIL import Image
for f_name in f_path:
try:
Image.open(f_name)
except Exception as e:
print(e)
os.remove(f_name)
(추가) user warning이 너무 많이 뜰때 안뜨게 해주는 코드 - 권장
import warnings
warnings.filterwarnings("ignore")
(추가) BCEWithLogitsLoss() : Binary class Cross Entropy + sigmoid function
고양이 강아지 분류 모델에서 마지막 레이터에 sigmoid를 안달고 이 loss를 지정해주면 따로 달지 않아도 sigmoid가 적용된다.
(Logit : sigmoid 와의 역함수 관계 ...?) Q. logit 과 sigmoid의 관계
⇒ 맨끝 loss 만 바꿔서 쉽게 작동하기 위해 자주 쓰임
☝ 코랩에서 사용할때는 나중에 load 해주기 위해 mydrive 안에 저장해라Transfer Learning
- 대용량의 데이터로 만든 모델에 내가 가지고 있는 데이터 적용
- torchvision은 기본적으로 다양한 모델 제공
- 모델의 일부분을 freezing 시키고 학습
나머지 모든 레이터를 다 frozen 시켜주고 마지막 linear 부분만 학습
1) Pretrained model Loading
import torch
from torchvision import models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vgg = models.vgg16(pretrained=True).to(device)
- Load 된 파라미터 보는 방법
1) torchsummary
from torchsummary import summary
summary(vgg, (3, 224, 224))
- output
Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 64, 224, 224] 1,792 ReLU-2 [-1, 64, 224, 224] 0 Conv2d-3 [-1, 64, 224, 224] 36,928 ReLU-4 [-1, 64, 224, 224] 0 MaxPool2d-5 [-1, 64, 112, 112] 0 Conv2d-6 [-1, 128, 112, 112] 73,856 ReLU-7 [-1, 128, 112, 112] 0 Conv2d-8 [-1, 128, 112, 112] 147,584 ReLU-9 [-1, 128, 112, 112] 0 MaxPool2d-10 [-1, 128, 56, 56] 0 Conv2d-11 [-1, 256, 56, 56] 295,168 ReLU-12 [-1, 256, 56, 56] 0 Conv2d-13 [-1, 256, 56, 56] 590,080 ReLU-14 [-1, 256, 56, 56] 0 Conv2d-15 [-1, 256, 56, 56] 590,080 ReLU-16 [-1, 256, 56, 56] 0 MaxPool2d-17 [-1, 256, 28, 28] 0 Conv2d-18 [-1, 512, 28, 28] 1,180,160 ReLU-19 [-1, 512, 28, 28] 0 Conv2d-20 [-1, 512, 28, 28] 2,359,808 ReLU-21 [-1, 512, 28, 28] 0 Conv2d-22 [-1, 512, 28, 28] 2,359,808 ReLU-23 [-1, 512, 28, 28] 0 MaxPool2d-24 [-1, 512, 14, 14] 0 Conv2d-25 [-1, 512, 14, 14] 2,359,808 ReLU-26 [-1, 512, 14, 14] 0 Conv2d-27 [-1, 512, 14, 14] 2,359,808 ReLU-28 [-1, 512, 14, 14] 0 Conv2d-29 [-1, 512, 14, 14] 2,359,808 ReLU-30 [-1, 512, 14, 14] 0 MaxPool2d-31 [-1, 512, 7, 7] 0 AdaptiveAvgPool2d-32 [-1, 512, 7, 7] 0 Linear-33 [-1, 4096] 102,764,544 ReLU-34 [-1, 4096] 0 Dropout-35 [-1, 4096] 0 Linear-36 [-1, 4096] 16,781,312 ReLU-37 [-1, 4096] 0 Dropout-38 [-1, 4096] 0 Linear-39 [-1, 1000] 4,097,000 ================================================================ Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.57 Forward/backward pass size (MB): 218.78 Params size (MB): 527.79 Estimated Total Size (MB): 747.15 ----------------------------------------------------------------
2) named_modules()
for name, layer in vgg.named_modules():
print(name, layer)
- output
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) ) features Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) features.0 Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.1 ReLU(inplace=True) features.2 Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.3 ReLU(inplace=True) features.4 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) features.5 Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.6 ReLU(inplace=True) features.7 Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.8 ReLU(inplace=True) features.9 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) features.10 Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.11 ReLU(inplace=True) features.12 Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.13 ReLU(inplace=True) features.14 Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.15 ReLU(inplace=True) features.16 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) features.17 Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.18 ReLU(inplace=True) features.19 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.20 ReLU(inplace=True) features.21 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.22 ReLU(inplace=True) features.23 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) features.24 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.25 ReLU(inplace=True) features.26 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.27 ReLU(inplace=True) features.28 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) features.29 ReLU(inplace=True) features.30 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) avgpool AdaptiveAvgPool2d(output_size=(7, 7)) classifier Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) classifier.0 Linear(in_features=25088, out_features=4096, bias=True) classifier.1 ReLU(inplace=True)
3) print(vgg)
- output
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) (fc): Linear(in_features=1000, out_features=1, bias=True) )
2) pretrained model parameter frozen
import torch
from torchvision import models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vgg = models.vgg16(pretrained=True).to(device)
class MyNewNet(nn.Module):
def __init__(self):
super(MyNewNet, self).__init__()
self.vgg19 = models.vgg19(pretrained=True)
# 모델 마지막에 Linear Layer 추가
self.linear_layers = nn.Linear(1000,1)
def forward(self,x):
x = self.vgg19(x)
return self.linear_layers(x)
for param in my_model.parameters(): # frozen
param.requires_grad = False
for param in my_model.linear_layers.parameters():
param.requires_grad = True
(stepping frozen : 각 layer를 다양하게 frozen 시켜 학습하는 방법)
(NLP는 HuggingFace가 거의 표준)
3) Fine Tuning
- 클래스 개수 맞춰주는 경우
vgg.fc = torch.nn.Linear(1000, 1)
vgg.classifier._modules['6'] = torch.nn.Linear(4096, 1)
- 기존 모델
for name, layer in vgg.named_modules():
print(name, layer)
----------------------------------------------------------------
# 뒤에 부분만 따옴
classifier Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
classifier.0 Linear(in_features=25088, out_features=4096, bias=True)
classifier.1 ReLU(inplace=True)
classifier.2 Dropout(p=0.5, inplace=False)
classifier.3 Linear(in_features=4096, out_features=4096, bias=True)
classifier.4 ReLU(inplace=True)
classifier.5 Dropout(p=0.5, inplace=False)
**classifier.6 Linear(in_features=4096, out_features=1000, bias=True)**
- 뒤에 FC Layer 달아주기
vgg.fc = torch.nn.Linear(1000, 1)
vgg.cuda()
----------------------------------------------------------------
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
**(6): Linear(in_features=4096, out_features=1000, bias=True)**
)
**(fc): Linear(in_features=1000, out_features=1, bias=True)**
)
- 모델의 바꾸고 싶은 부분 지정해서 바꿔주기
vgg.classifier._modules['6'] = torch.nn.Linear(4096, 1)
vgg.cuda()
----------------------------------------------------------------
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
**(6): Linear(in_features=4096, out_features=1, bias=True)**
)
)
결과적으로 frozen 전과 후의 파라미터를 비교하고 싶으면
it = my_model.linear_layers.parameters()
# frozen 시키지 않은 레이터의 파라미터
next(it)
를 실행하여 파라미터의 변화가 있는지 살펴보면 된다.
Author And Source
이 문제에 관하여([PyTorch] Transfer Learning), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@ohado/PyTorch-Transfer-Learning저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)