[제로에서부터 YOLOv3 배우기] 7.YOLOv3 모델에 Attention 메커니즘을 추가하는 법을 알려드릴게요.

선언: [제로에서 YOLOv3를 배우기 시작] 시리즈를 많이 쓸수록 원래 정해진 내용이 적지만 코드를 읽는 과정에서 새로운 하이라이트를 발굴하여 이 시리즈에 계속 추가되었다.그동안 YOLOv3의 코드를 읽고 cfg 파일, 모델 구축 등을 배웠다.본고는 이전의 토대 위에서 모델의 코드를 수정하여 이전의 Attention 시리즈 중의 SE 모듈과 CBAM 모듈을 YOLOv3에 통합시켰다.

1. 격식을 정한다

[convolutional],[maxpool],[net],[route] 등 층이 cfg에서 정의한 바와 같이 우리가 새로운 모듈을 추가할 때 cfg의 형식을 정해야 한다.다음과 같은 규정을 내렸다.
SE 모듈(구체적인 설명: [cv의 Attention 메커니즘]에서 가장 간단하고 실현하기 쉬운 SE 모듈) 중 하나의 매개 변수는 reduction인데 이 매개 변수는 기본적으로 16이다. 따라서 이 모듈의 상세한 매개 변수는 다음과 같은 내용에 따라 설정한다.

[se]
reduction=16

CBAM 모듈(구체적인 설명: [CV의 Attention 메커니즘] ECCV 2018 Convolutional Block Attention Module)에서 공간 주의력 메커니즘과 채널 주의력 메커니즘에는 모두 두 가지 파라미터가 존재한다. ratio와 kernel_size이기 때문에 CBAM이 cfg 파일에 있는 형식을 이렇게 규정한다.

[cbam]
ratio=16
kernelsize=7

2. 해석 부분 수정

이러한 매개 변수는 사용자 정의이기 때문에 cfg 파일을 해석하는 함수를 수정해야 합니다. 앞서 설명한 바와 같이 수정parse_config.py의 일부가 필요합니다.

def parse_model_cfg(path):
    # path   : cfg/yolov3-tiny.cfg
    if not path.endswith('.cfg'):
        path += '.cfg'
    if not os.path.exists(path) and \
    	   os.path.exists('cfg' + os.sep + path):
        path = 'cfg' + os.sep + path

    with open(path, 'r') as f:
        lines = f.read().split('
')

    #    #   ，         
    lines = [x for x in lines if x and not x.startswith('#')]
    lines = [x.rstrip().lstrip() for x in lines]
    mdefs = []  #      
    for line in lines:
        if line.startswith('['):  #           
            '''
            eg:
            [shortcut]
            from=-3
            activation=linear
            '''
            mdefs.append({})
            mdefs[-1]['type'] = line[1:-1].rstrip()
            if mdefs[-1]['type'] == 'convolutional':
                mdefs[-1]['batch_normalize'] = 0 
        else:
            key, val = line.split("=")
            key = key.rstrip()

            if 'anchors' in key:
                mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))
            else:
                mdefs[-1][key] = val.strip()

    # Check all fields are supported
    supported = ['type', 'batch_normalize', 'filters', 'size',\
                 'stride', 'pad', 'activation', 'layers', \
                 'groups','from', 'mask', 'anchors', \
                 'classes', 'num', 'jitter', 'ignore_thresh',\
                 'truth_thresh', 'random',\
                 'stride_x', 'stride_y']

    f = []  # fields
    for x in mdefs[1:]:
        [f.append(k) for k in x if k not in f]
    u = [x for x in f if x not in supported]  # unsupported fields
    assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)

    return mdefs

위의 내용 중 supported의 필드를 변경하여 우리의 내용을 추가합니다.

supported = ['type', 'batch_normalize', 'filters', 'size',\
            'stride', 'pad', 'activation', 'layers', \
            'groups','from', 'mask', 'anchors', \
            'classes', 'num', 'jitter', 'ignore_thresh',\
            'truth_thresh', 'random',\
            'stride_x', 'stride_y',\
            'ratio', 'reduction', 'kernelsize']

3. SE 및 CBAM 구현

구체적인 원리는 [cv의 Attention 메커니즘]에서 가장 간단하고 실현하기 쉬운 SE 모듈과 [CV의 Attention 메커니즘] ECCV 2018 Convolutional Block Attention Module 두 문장을 보십시오. 다음은 상기 두 문장의 코드를 직접 사용하십시오.
SE

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

CBAM

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3,7), "kernel size must be 3 or 7"
        padding = 3if kernel_size == 7else1

        self.conv = nn.Conv2d(2,1,kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = torch.mean(x, dim=1, keepdim=True)
        maxout, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avgout, maxout], dim=1)
        x = self.conv(x)
        return self.sigmoid(x)
    
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, rotio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.sharedMLP = nn.Sequential(
            nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(),
            nn.Conv2d(in_planes // rotio, in_planes, 1, bias=False))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = self.sharedMLP(self.avg_pool(x))
        maxout = self.sharedMLP(self.max_pool(x))
        return self.sigmoid(avgout + maxout)

이상은 두 모듈의 코드로 models.py 파일에 추가되었습니다.

4. cfg 파일 설계

여기yolov3-tiny.cfg를 베이스라인으로 하고 주의력 메커니즘 모듈을 추가합니다.
CBAM은 SE와 유사하기 때문에 SE를 예로 들면 백bone 다음 부분에 추가하여 정보 재구성(refinement)을 한다.

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[se]
reduction=16

#  backbone       se  
#####backbone######

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

5. 모델 구축

이상은 준비 작업입니다. SE를 예로 들면 model.py 파일의 모델 불러오는 부분을 수정하고 forward 함수 부분의 코드를 수정하여 정상적으로 역할을 발휘하도록 합니다.model.py의 create_modules 함수에 다음을 추가합니다.

        elif mdef['type'] == 'se':
            modules.add_module(
                'se_module',
                SELayer(output_filters[-1], reduction=int(mdef['reduction'])))

그런 다음 Darknet에서 forward 섹션의 함수를 수정합니다.

def forward(self, x, var=None):
    img_size = x.shape[-2:]
    layer_outputs = []
    output = []

    for i, (mdef,
            module) in enumerate(zip(self.module_defs, self.module_list)):
        mtype = mdef['type']
        if mtype in ['convolutional', 'upsample', 'maxpool']:
            x = module(x)
        elif mtype == 'route':
            layers = [int(x) for x in mdef['layers'].split(',')]
            if len(layers) == 1:
                x = layer_outputs[layers[0]]
            else:
                try:
                    x = torch.cat([layer_outputs[i] for i in layers], 1)
                except:  # apply stride 2 for darknet reorg layer
                    layer_outputs[layers[1]] = F.interpolate(
                        layer_outputs[layers[1]], scale_factor=[0.5, 0.5])
                    x = torch.cat([layer_outputs[i] for i in layers], 1)

        elif mtype == 'shortcut':
            x = x + layer_outputs[int(mdef['from'])]
        elif mtype == 'yolo':
            output.append(module(x, img_size))
        layer_outputs.append(x if i in self.routs else [])

forward에 SE 모듈을 넣는 것은 사실 매우 간단하다.SE 모듈은 볼륨 레이어, 샘플링, 최대 풀 레이어 위치와 동일하며 추가 작업이 필요하지 않으며 위의 코드만 수정하면 됩니다.

    for i, (mdef,
            module) in enumerate(zip(self.module_defs, self.module_list)):
        mtype = mdef['type']
        if mtype in ['convolutional', 'upsample', 'maxpool', 'se']:
            x = module(x)

CBAM의 전체적인 과정은 유사하기 때문에 스스로 시도해 볼 수 있고 YOLOv3의 전체적인 과정을 익힐 수 있다.
후기: 본고의 내용은 매우 간단하지만 주의력 모듈을 첨가했기 때문에 실현하기 쉽다.그러나 구체적인 주의력 메커니즘의 위치, 몇 개의 모듈을 넣는지 실험을 통해 검증해야 한다.주의력 메커니즘은 결코 만금유가 아니기 때문에 많이 조절하고 많이 시도해야만 만족스러운 결과를 얻을 수 있다.저에게 단체 채팅방에 가입하여 각자의 데이터 집합에 피드백하는 것을 환영합니다.
ps:요즘 다들 몸 조심하고 마스크 쓰고 다녀요.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

다양한 언어의 JSON

JSON은 Javascript 표기법을 사용하여 데이터 구조를 레이아웃하는 데이터 형식입니다. 그러나 Javascript가 코드에서 이러한 구조를 나타낼 수 있는 유일한 언어는 아닙니다. 저는 일반적으로 '객체'{}...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.