[주방장해우] 0에서 RetinaNet(一):COCO와 VOC 데이터 집합 처리

89290 단어 심도 있는 학습 인공지능 컴퓨터 시각

문서 목록

앞말

COCO 데이터 세트 소개

VOC 데이터 세트 소개

COCO 및 VOC 데이터 세트 파일 구성 구조

COCO 데이터 세트 처리

VOC 데이터 세트 처리

모든 코드가 본인github repository에 업로드되었습니다.https://github.com/zgcr/pytorch-ImageNet-CIFAR-COCO-VOC-training도움이 된다면 스타를 눌러주세요!다음 코드는pytorch1에 있습니다.4 버전에서 테스트하여 정확하고 틀림없음을 확인하였습니다.

전언

앞의 베이스 모델 시리즈인 ImageNet의 훈련 실천을 통해 필자는 마침내 학습 목표 측정을 시작할 것이다.목표 검출이라는 세부 사항은 특히 많은데 이런 세부 사항은 논문에서 언급하지 않기 때문에 코드에서만 이런 세부 사항을 더 잘 이해할 수 있다.학습의 가장 좋은 방법은 스스로 목표 측정기를 실현하는 것이다.본 시리즈에서 필자는 0부터 1단계 목표 검측기인 RetinaNet을 실현할 것이다. 이는 데이터 집합 처리, 데이터 강화, 네트워크 구조,loss,decode 등 부분을 포함한다.

COCO 데이터 세트 소개

COCO 데이터 세트 공식 사이트 주소:http://cocodataset.org/#home .COCO는 대규모 대상 탐지 데이터 세트입니다.코코 데이터 집합은 매년 업데이트되지만 목표 검측 논문에서 우리는 코코 2014와 코코 2017 데이터 집합만 사용한다.COCO2017 데이터 집합은 세 개의 서브집합:train(118287장),val(5000장),test(40670장)를 포함하고 모두 80개의 종류가 있다.그 중에서train과val집은 모두groundtruth를 제공하고test집에는groundturth가 없으며detect 결과를COCO 데이터집 홈페이지에 제출하여 테스트해야 결과를 얻을 수 있습니다.
COCO 2014와 COCO 2017 데이터 집합의 차이는?RetinaNet 논문에서 사용할 수 있는 Detectron 소스 코드(https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md)에서 우리는 관련 해석을 찾을 수 있다. RetinaNet 논문에서 모든 모델은 코코2014_train 데이터 세트(82783장 사진)와coco2014_valminusminival 데이터 집합(40504장 사진)은 무작위로 35504장의 그림을 포함하는 서브집합을 합쳐서 훈련한다. 이 합집합은 실제로coco2017_train 데이터 집합이 완전히 일치합니다.테스트 시 모든 모형이 코코2014_미니벌 데이터 세트 나머지 5000장의 그림을 포함하는 다른 서브집합에서 테스트를 진행하는데, 이 서브집합은 실제로coco2017_val 데이터 세트가 완전히 일치합니다.즉, RetinaNet 논문에서 실제로는 코코로2017_train 데이터 집합 트레이닝 모형,coco2017_val 데이터 세트 테스트 모델.RetinaNet 논문에서 모델의 표현은 IoU=0.5:0.95에서 최대 100개의 detect 목표를 보존하고 모든 크기의 목표 아래의 mAP(즉pycocotools.coeval의 COCOeval 클래스 중 summarizeDets 함수 중의stats[0] 값)를 보존하는 것을 말한다.
모델이val 데이터 집합과test 데이터 집합에서의 표현은 얼마나 차이가 있습니까?train,val,test집합은 사실상 모두 같은 모 데이터 집합에서 무작위로 세 부분으로 나누어져 있기 때문에 모델은val집합과test집합에서 표현 차이가 매우 작다.val과test에서 테스트 모델을 동시에 실시한 다른 논문에서 제시한 결과에 따르면 일반적으로val과test집합에서 모델의 mAP는 0.2-0.3퍼센트 정도 차이가 난다.
다음 재현에서 우리는 RetinaNet 논문의 데이터 집합 설정에 따라coco 를 사용합니다2017_train 데이터 집합 트레이닝 모형,coco 사용2017_val 데이터 세트 테스트 모델.IoU=0.5:0.95를 사용하여 최대 100개의 detect 목표를 보존하고 모든 크기의 목표 아래의 mAP(즉 pycocotools.coeval의 COCOeval 클래스 중 summarizeDets 함수 중의stats[0] 값)를 모델의 성능으로 표현합니다.

VOC 데이터 세트 소개

VOC 데이터 세트 공식 사이트 주소:http://host.robots.ox.ac.uk/pascal/VOC/.VOC도 목표 검측 데이터 집합이지만 규모는 COCO 데이터 집합보다 훨씬 작다.목표 탐지 논문에서 우리는 보통 VOC2007과 VOC2012를 사용한다.코코 데이터 집합과 마찬가지로 VOC 2007과 VOC 2012는 모두train,val,test 세 개의 서브집합으로 나뉘어 모두 20개의 클래스가 있다.VOC2007,train,val,test 세 개의 서브집합에groundtruth를 제공합니다.VOC2012에는 train, val 두 개의 서브셋만 ground truth를 제공합니다.
detectron2에서faster rcnn을 사용하여 VOC 데이터 집합에서 테스트를 훈련하는 방법(https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.mdVOC2007trainval+VOC2012trainval 데이터 집합 트레이닝 모델을 사용하고 VOC2007test 데이터 집합 테스트 모델을 사용한다.테스트 시 mAP는 VOC2007의 11 point metric 방식으로 계산됩니다.

COCO 및 VOC 데이터 세트 파일 조직 구조

COCO 데이터 세트와 VOC 데이터 세트를 다운로드한 후 폴더 조직 구조를 다음과 같이 조정했습니다.

COCO2017
|
|
|----annotations----contains all annotaion json files
|
|                  |----train2017
|----images--------|----val2017
                   |----test2017

VOCdataset
|
|
|                  |----Annotations
|                  |----ImageSets
|----VOC2007-------|----JPEGImages
|                  |----SegmentationClass
|                  |----SegmentationObject
|
|                  |----Annotations
|                  |----ImageSets
|----VOC2012-------|----JPEGImages
|                  |----SegmentationClass
|                  |----SegmentationObject

COCO 데이터 세트 처리

COCO2017 데이터 세트 마크업에 제공된 원시box 좌표는 [x min, y min, w, h]입니다. 즉, 상자의 왼쪽 상단 좌표와 상자의 넓이를 [x min, y min, x max, y max]로 변환합니다. 즉, 상자의 왼쪽 상단 좌표와 상자의 오른쪽 하단 좌표입니다.또한 마크업에도 클래스 index를 제공하지만 원시 마크업의 클래스 index는 연속되지 않습니다(1-90, 그러나 80개의 클래스만 있음). 우리는 이를 연속적인 클래스 index0-79로 변환해야 합니다.COCO 데이터 세트를 처리하는 코드는 다음과 같습니다.

import os
import cv2
import torch
import numpy as np
import random
from torch.utils.data import Dataset
from pycocotools.coco import COCO
import torch.nn.functional as F

COCO_CLASSES = [
    "person",
    "bicycle",
    "car",
    "motorcycle",
    "airplane",
    "bus",
    "train",
    "truck",
    "boat",
    "traffic light",
    "fire hydrant",
    "stop sign",
    "parking meter",
    "bench",
    "bird",
    "cat",
    "dog",
    "horse",
    "sheep",
    "cow",
    "elephant",
    "bear",
    "zebra",
    "giraffe",
    "backpack",
    "umbrella",
    "handbag",
    "tie",
    "suitcase",
    "frisbee",
    "skis",
    "snowboard",
    "sports ball",
    "kite",
    "baseball bat",
    "baseball glove",
    "skateboard",
    "surfboard",
    "tennis racket",
    "bottle",
    "wine glass",
    "cup",
    "fork",
    "knife",
    "spoon",
    "bowl",
    "banana",
    "apple",
    "sandwich",
    "orange",
    "broccoli",
    "carrot",
    "hot dog",
    "pizza",
    "donut",
    "cake",
    "chair",
    "couch",
    "potted plant",
    "bed",
    "dining table",
    "toilet",
    "tv",
    "laptop",
    "mouse",
    "remote",
    "keyboard",
    "cell phone",
    "microwave",
    "oven",
    "toaster",
    "sink",
    "refrigerator",
    "book",
    "clock",
    "vase",
    "scissors",
    "teddy bear",
    "hair drier",
    "toothbrush",
]

colors = [
    (39, 129, 113),
    (164, 80, 133),
    (83, 122, 114),
    (99, 81, 172),
    (95, 56, 104),
    (37, 84, 86),
    (14, 89, 122),
    (80, 7, 65),
    (10, 102, 25),
    (90, 185, 109),
    (106, 110, 132),
    (169, 158, 85),
    (188, 185, 26),
    (103, 1, 17),
    (82, 144, 81),
    (92, 7, 184),
    (49, 81, 155),
    (179, 177, 69),
    (93, 187, 158),
    (13, 39, 73),
    (12, 50, 60),
    (16, 179, 33),
    (112, 69, 165),
    (15, 139, 63),
    (33, 191, 159),
    (182, 173, 32),
    (34, 113, 133),
    (90, 135, 34),
    (53, 34, 86),
    (141, 35, 190),
    (6, 171, 8),
    (118, 76, 112),
    (89, 60, 55),
    (15, 54, 88),
    (112, 75, 181),
    (42, 147, 38),
    (138, 52, 63),
    (128, 65, 149),
    (106, 103, 24),
    (168, 33, 45),
    (28, 136, 135),
    (86, 91, 108),
    (52, 11, 76),
    (142, 6, 189),
    (57, 81, 168),
    (55, 19, 148),
    (182, 101, 89),
    (44, 65, 179),
    (1, 33, 26),
    (122, 164, 26),
    (70, 63, 134),
    (137, 106, 82),
    (120, 118, 52),
    (129, 74, 42),
    (182, 147, 112),
    (22, 157, 50),
    (56, 50, 20),
    (2, 22, 177),
    (156, 100, 106),
    (21, 35, 42),
    (13, 8, 121),
    (142, 92, 28),
    (45, 118, 33),
    (105, 118, 30),
    (7, 185, 124),
    (46, 34, 146),
    (105, 184, 169),
    (22, 18, 5),
    (147, 71, 73),
    (181, 64, 91),
    (31, 39, 184),
    (164, 179, 33),
    (96, 50, 18),
    (95, 15, 106),
    (113, 68, 54),
    (136, 116, 112),
    (119, 139, 130),
    (31, 139, 34),
    (66, 6, 127),
    (62, 39, 2),
    (49, 99, 180),
    (49, 119, 155),
    (153, 50, 183),
    (125, 38, 3),
    (129, 87, 143),
    (49, 87, 40),
    (128, 62, 120),
    (73, 85, 148),
    (28, 144, 118),
    (29, 9, 24),
    (175, 45, 108),
    (81, 175, 64),
    (178, 19, 157),
    (74, 188, 190),
    (18, 114, 2),
    (62, 128, 96),
    (21, 3, 150),
    (0, 6, 95),
    (2, 20, 184),
    (122, 37, 185),
]


class CocoDetection(Dataset):
    def __init__(self,
                 image_root_dir,
                 annotation_root_dir,
                 set='train2017',
                 transform=None):
        self.image_root_dir = image_root_dir
        self.annotation_root_dir = annotation_root_dir
        self.set_name = set
        self.transform = transform

        self.coco = COCO(
            os.path.join(self.annotation_root_dir,
                         'instances_' + self.set_name + '.json'))

        self.load_classes()

    def load_classes(self):
        self.image_ids = self.coco.getImgIds()
        self.cat_ids = self.coco.getCatIds()
        self.categories = self.coco.loadCats(self.cat_ids)
        self.categories.sort(key=lambda x: x['id'])

        # category_id is an original id,coco_id is set from 0 to 79
        self.category_id_to_coco_label = {
            category['id']: i
            for i, category in enumerate(self.categories)
        }
        self.coco_label_to_category_id = {
            v: k
            for k, v in self.category_id_to_coco_label.items()
        }

    def __len__(self):
        return len(self.image_ids)

    def __getitem__(self, idx):
        img = self.load_image(idx)
        annot = self.load_annotations(idx)

        sample = {'img': img, 'annot': annot, 'scale': 1.}
        if self.transform:
            sample = self.transform(sample)
        return sample

    def load_image(self, image_index):
        image_info = self.coco.loadImgs(self.image_ids[image_index])[0]
        path = os.path.join(self.image_root_dir, image_info['file_name'])
        img = cv2.imread(path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        return img.astype(np.float32) / 255.

    def load_annotations(self, image_index):
        # get ground truth annotations
        annotations_ids = self.coco.getAnnIds(
            imgIds=self.image_ids[image_index], iscrowd=None)
        annotations = np.zeros((0, 5))

        # some images appear to miss annotations
        if len(annotations_ids) == 0:
            return annotations

        # parse annotations
        coco_annotations = self.coco.loadAnns(annotations_ids)
        for _, a in enumerate(coco_annotations):
            # some annotations have basically no width / height, skip them
            if a['bbox'][2] < 1 or a['bbox'][3] < 1:
                continue

            annotation = np.zeros((1, 5))
            annotation[0, :4] = a['bbox']
            annotation[0, 4] = self.find_coco_label_from_category_id(
                a['category_id'])

            annotations = np.append(annotations, annotation, axis=0)

        # transform from [x_min, y_min, w, h] to [x_min, y_min, x_max, y_max]
        annotations[:, 2] = annotations[:, 0] + annotations[:, 2]
        annotations[:, 3] = annotations[:, 1] + annotations[:, 3]

        return annotations

    def find_coco_label_from_category_id(self, category_id):
        return self.category_id_to_coco_label[category_id]

    def find_category_id_from_coco_label(self, coco_label):
        return self.coco_label_to_category_id[coco_label]

    def num_classes(self):
        return 80

    def image_aspect_ratio(self, image_index):
        image = self.coco.loadImgs(self.image_ids[image_index])[0]
        return float(image['width']) / float(image['height'])

이 종류가 반복되는 모든 대상은 한 장의 그림에 대한 정보(한 사전에서), 키'img'에 대응하는 값은 그림이고, 키'annot'에 대응하는numpy 수조는 이 그림이 표시된 대상이다.모든 그림에 표시된 대상의 수량이 반드시 같지 않을 수도 있고, 어떤 그림에 표시 대상이 없을 수도 있으니 주의해라.

VOC 데이터 세트 처리

VOC 데이터 세트 마크업에 제공된 원시 box 좌표는 [x min, y min, x max, y max]이므로 좌표를 변환할 필요가 없습니다.표시줄에 클래스name만 제공합니다. 클래스 index0-19로 비추려고 합니다.VOC 데이터 세트를 처리하는 코드는 다음과 같습니다.

import os
import cv2
import numpy as np
import random
import xml.etree.ElementTree as ET

import torch
from torch.utils.data import Dataset

VOC_CLASSES = [
    "aeroplane",
    "bicycle",
    "bird",
    "boat",
    "bottle",
    "bus",
    "car",
    "cat",
    "chair",
    "cow",
    "diningtable",
    "dog",
    "horse",
    "motorbike",
    "person",
    "pottedplant",
    "sheep",
    "sofa",
    "train",
    "tvmonitor",
]

colors = [
    (39, 129, 113),
    (164, 80, 133),
    (83, 122, 114),
    (99, 81, 172),
    (95, 56, 104),
    (37, 84, 86),
    (14, 89, 122),
    (80, 7, 65),
    (10, 102, 25),
    (90, 185, 109),
    (106, 110, 132),
    (169, 158, 85),
    (188, 185, 26),
    (103, 1, 17),
    (82, 144, 81),
    (92, 7, 184),
    (49, 81, 155),
    (179, 177, 69),
    (93, 187, 158),
    (13, 39, 73),
]


class VocDetection(Dataset):
    def __init__(self,
                 root_dir,
                 image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
                 transform=None,
                 keep_difficult=False):
        self.root_dir = root_dir
        self.image_set = image_sets
        self.transform = transform
        self.categories = VOC_CLASSES

        self.category_id_to_voc_label = dict(
            zip(self.categories, range(len(self.categories))))
        self.voc_label_to_category_id = {
            v: k
            for k, v in self.category_id_to_voc_label.items()
        }

        self.keep_difficult = keep_difficult

        self._annopath = os.path.join('%s', 'Annotations', '%s.xml')
        self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg')
        self.ids = list()
        for (year, name) in image_sets:
            rootpath = os.path.join(self.root_dir, 'VOC' + year)
            for line in open(
                    os.path.join(rootpath, 'ImageSets', 'Main',
                                 name + '.txt')):
                self.ids.append((rootpath, line.strip()))

    def __getitem__(self, idx):
        img_id = self.ids[idx]
        img = self.load_image(img_id)

        target = ET.parse(self._annopath % img_id).getroot()
        annot = self.load_annotations(target)

        sample = {'img': img, 'annot': annot, 'scale': 1.}

        if self.transform:
            sample = self.transform(sample)
        return sample

    def load_image(self, img_id):
        img = cv2.imread(self._imgpath % img_id)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        return img.astype(np.float32) / 255.

    def load_annotations(self, target):
        annotations = []
        for obj in target.iter('object'):
            difficult = int(obj.find('difficult').text) == 1
            if not self.keep_difficult and difficult:
                continue
            name = obj.find('name').text.lower().strip()
            bbox = obj.find('bndbox')

            pts = ['xmin', 'ymin', 'xmax', 'ymax']

            bndbox = []
            for pt in pts:
                cur_pt = float(bbox.find(pt).text)
                bndbox.append(cur_pt)
            label_idx = self.category_id_to_voc_label[name]
            bndbox.append(label_idx)
            annotations += [bndbox]  # [xmin, ymin, xmax, ymax, label_ind]
            # img_id = target.find('filename').text[:-4]

        annotations = np.array(annotations)
        # format:[[x1, y1, x2, y2, label_ind], ... ]
        return annotations

    def find_category_id_from_voc_label(self, voc_label):
        return self.voc_label_to_category_id[voc_label]

    def image_aspect_ratio(self, idx):
        img_id = self.ids[idx]
        image = self.load_image(img_id)
        #w/h
        return float(image.shape[1]) / float(image.shape[0])

    def __len__(self):
        return len(self.ids)

COCO 클래스와 유사합니다. 이 클래스가 범람하는 모든 대상은 한 장의 그림에 대한 정보 (한 사전에서) 이고, 키 'img' 에 대응하는 값은 그림이며, 키 'annot' 에 대응하는numpy 수조는 이 그림이 표시된 대상입니다.모든 그림에 표시된 대상의 수량이 반드시 같지 않을 수도 있고, 어떤 그림에 표시 대상이 없을 수도 있으니 주의해라.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

[Caffe] mnist 인식 프로세스

cd $CAFFE_ROOT 트레이닝 데이터 다운로드 ./data/mnist/get_mnist.sh 데이터 세트 만들기: ./examples/mnist/create_mnist.sh 트레이닝 모델: ./examples/...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

Java 프로젝트가 공격받기 쉬운 Log4j 버전에 의존하는지 확인하는 방법

JS에서 forEach 메소드를 사용하는 방법을 배우십시오!

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다