Faster RCNN 목표 검출 의 Loss

11717 단어 python pytorch cnn

이 절 은 주로 faster rcnn 의 오 차 를 소개 합 니 다.faster rcnn 의 오 차 는 두 부분 으로 나 눌 수 있 습 니 다.rpn Loss 와 fast rcnn Loss.faster rcnn 의 마지막 오 차 는 rpn Loss 와 fast rcnn Loss 의 합 이다.
rpn Loss 와 fast rcnn Loss 의 전방 향 전파 과정 은 기본적으로 일치 합 니 다.분류 오차 와 포 지 셔 닝 오 차 를 포함 하고 분류 오 차 는 교차 엔트로피 분류 오차(CrossEntropy)를 사용 하 며 포 지 셔 닝 오 차 는 Smooth L1 오 차 를 사용 합 니 다.
Loss 전파 과정 에서 가장 중요 한 부분 은 네트워크 의 예측 을 실제 지면 틀 과 어떻게 연결 하 느 냐 하 는 것 이다.이것 도 오차 계산 과정 에서 이해 하기 어 려 운 부분 이다.다음은 이 부분 을 상세 하 게 소개 하 겠 다.
RPN Loss
RPN Loss 전방 향 전파 과정
rpn Loss 의 전방 향 전파 과정 은 다음 과 같다.

먼저 anchor 와 bbox(실제 상자)의 iou(shape->len(anchor)*len(bbox)을 계산 하고 모든 anchor 와 가장 높 은 iou 와 그 색인(줄 색인)을 추출 하 며 모든 bbox 와 가장 높 은 iou 를 가 진 줄 색인 을 추출 합 니 다

4.567917.지난 단계 에 얻 은 iou 와 위치 색인 에 따라 설 정 된 한도 값(전망 iou 한도 값 과 배경 iou 한도 값)을 결합 하여 모든 anchor 에 서로 다른 태그 값(-1->관심 없 음,0->배경,1->전망)을 부여 합 니 다
4.567917.설 정 된 anchor 보존 수량 과 전망 anchor 비례 에 따라 전후 경 anchor 를 무 작위 로 추출 하여 샘플링 한다
4.567917.보 존 된 모든 anchor 와 모든 anchor 에 대응 하여 가장 높 은 iou 를 가 진 bbox 계산 좌표 오프셋
4.567917.마지막 으로 rpn 네트워크 의 출력 과 결합 하여 분류 오차 와 포 지 셔 닝 오 차 를 계산한다
코드 구현
Anchor TargetLayer 류 의 코드 를 보 여 줍 니 다.이 종 류 는 anchor 와 지면 실제 상자(bbox)를 연결 합 니 다.모든 anchor 에 bbox 를 일치 시 키 고 일치 하 는 좌표 오프셋 과 앞 뒤 경 태 그 를 되 돌려 주 는 것 이 주요 역할 입 니 다.

class AnchorTargetCreator(object):
    """
         bbox anchor
    
    Args:
        n_sample (int):            .
        pos_iou_thresh (float): iou            
        neg_iou_thresh (float): iou            
        pos_ratio (float):                   

    """

    def __init__(self,
                 n_sample=256,
                 pos_iou_thresh=0.7, neg_iou_thresh=0.3,
                 pos_ratio=0.5):
        self.n_sample = n_sample
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh = neg_iou_thresh
        self.pos_ratio = pos_ratio

    def __call__(self, bbox, anchor, img_size):
        """
        * :math:`S` anchor  
        * :math:`R` bbox  
        Args:
            bbox (array): bbox   . shape -> :math:`(R, 4)`.
            anchor (array): anchor   . shape -> :math:`(S, 4)`.
            img_size (tuple of ints):      :obj:`H, W`,      .
        Returns:
            #NOTE: it's scale not only  offset
            * **loc**:   anchor bbox     .shape -> :math:`(S, 4)`.
            * **label**: anchor               :obj:`(1=positive, 0=negative, -1=ignore)`. shape -> :math:`(S,)`.
        """

        img_H, img_W = img_size

        n_anchor = len(anchor)

        #        anchor
        inside_index = _get_inside_index(anchor, img_H, img_W)
        anchor = anchor[inside_index]

        #     anchor    bbox(  ),          
        argmax_ious, label = self._create_label(
            inside_index, anchor, bbox)

        # anchor -> bbox,          
        loc = bbox2loc(anchor, bbox[argmax_ious])

        # map up to original set of anchors
        label = _unmap(label, n_anchor, inside_index, fill=-1)
        loc = _unmap(loc, n_anchor, inside_index, fill=0)

        return loc, label

    def _create_label(self, inside_index, anchor, bbox):
        # label: 1 is positive, 0 is negative, -1 is dont care
        #    anchor  ,   -1.
        # -1      , 0    , 1    /  
        label = np.empty((len(inside_index),), dtype=np.int32)
        label.fill(-1)

        argmax_ious, max_ious, gt_argmax_ious = \
            self._calc_ious(anchor, bbox, inside_index)

        # assign negative labels first so that positive labels can clobber them
        #     anchor,        iou    iou  ,      1
        label[max_ious < self.neg_iou_thresh] = 0

        #     bbox,        iou anchor,      1
        label[gt_argmax_ious] = 1

        #     anchor,        iou    iou  ,      1
        label[max_ious >= self.pos_iou_thresh] = 1

        #     ,      anchor       anchor,       
        n_pos = int(self.pos_ratio * self.n_sample)
        pos_index = np.where(label == 1)[0]
        if len(pos_index) > n_pos:
            disable_index = np.random.choice(
                pos_index, size=(len(pos_index) - n_pos), replace=False)
            label[disable_index] = -1

        #     ,      anchor       anchor,       
        n_neg = self.n_sample - np.sum(label == 1)
        neg_index = np.where(label == 0)[0]
        if len(neg_index) > n_neg:
            disable_index = np.random.choice(
                neg_index, size=(len(neg_index) - n_neg), replace=False)
            label[disable_index] = -1

        return argmax_ious, label

    def _calc_ious(self, anchor, bbox, inside_index):
        #   anchor bbox iou，ious ->(len(inside_index), len(bbox))
        ious = bbox_iou(anchor, bbox)
        #      anchor iou   bbox(   )   
        argmax_ious = ious.argmax(axis=1)

        #          iou, shape -> (len(inside_index),)
        max_ious = ious[np.arange(len(inside_index)), argmax_ious]

        #      bbox iou   anchor   
        gt_argmax_ious = ious.argmax(axis=0)

        #          iou, shape -> (len(bbox), )
        gt_max_ious = ious[gt_argmax_ious, np.arange(ious.shape[1])]

        #      iou       
        gt_argmax_ious = np.where(ious == gt_max_ious)[0]

        return argmax_ious, max_ious, gt_argmax_ious


def _unmap(data, count, index, fill=0):
    if len(data.shape) == 1:
        ret = np.empty((count,), dtype=data.dtype)
        ret.fill(fill)
        ret[index] = data
    else:
        ret = np.empty((count,) + data.shape[1:], dtype=data.dtype)
        ret.fill(fill)
        ret[index, :] = data
    return ret


def _get_inside_index(anchor, H, W):
    #           anchor.
    index_inside = np.where(
        (anchor[:, 0] >= 0) &
        (anchor[:, 1] >= 0) &
        (anchor[:, 2] <= H) &
        (anchor[:, 3] <= W)
    )[0]
    return index_inside

Fast R-CNN Loss
ProposalTargetCreator 류 의 상세 한 코드 를 직접 보 여 줍 니 다.지역 후보 와 실제 상 자 를 연결 합 니 다.주요 기능 은 모든 bbox 를 roi 에 일치 시 키 고 샘플링 후의 roi,샘플링 후 roi 의 좌표 오프셋 과 샘플링 후 roi 의 분류 라벨 을 되 돌려 주 는 것 입 니 다.
코드 구현

class ProposalTargetCreator(object):
    """
         bbox roi

    Args:
        n_sample (int):       roi   
        pos_ratio (float):     roi      roi             pos_iou_thresh (float): iou           
        neg_iou_thresh_hi (float): iou  [neg_iou_thresh_lo, neg_iou_thresh_hi]        
        neg_iou_thresh_lo (float): 

    """

    def __init__(self,
                 n_sample=128,
                 pos_ratio=0.25, pos_iou_thresh=0.5,
                 neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0
                 ):
        self.n_sample = n_sample
        self.pos_ratio = pos_ratio
        self.pos_iou_thresh = pos_iou_thresh
        self.neg_iou_thresh_hi = neg_iou_thresh_hi
        self.neg_iou_thresh_lo = neg_iou_thresh_lo 

    def __call__(self, roi, bbox, label,
                 loc_normalize_mean=(0., 0., 0., 0.),
                 loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):
                 
        """

        * :math:`S` roi      
        * :math:`L`            .

        Args:
            roi (array):rpn    roi. shape -> :math:`(R, 4)`
            bbox (array):      . shape -> :math:`(R', 4)`.
            label (array):          . shape -> :math:`(R',)`.    
            loc_normalize_mean (tuple of four floats):             .
            loc_normalize_std (tupler of four floats):               .

        Returns:
            * **sample_roi**:    roi. shape -> :math:`(S, 4)`.
            * **gt_roi_loc**:     roi bbox         . shape ->:math:`(S, 4)`.
            
            * **gt_roi_label**:     roi     . shape ->:math:`(S,)`. shape -> :math:`[0, L]`.
        """
        n_bbox, _ = bbox.shape
        
        #         roi   
        pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)
        
        #   roi bbox iou，iou -> (len(roi), len(bbox))
        iou = bbox_iou(roi, bbox)
        
        #    anchor，    iou   bbox   
        gt_assignment = iou.argmax(axis=1)
        
        #    anchor，         iou
        max_iou = iou.max(axis=1)
        
        # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
        
        #    0    
        gt_roi_label = label[gt_assignment] + 1

        #      ,      roi       roi,       
        #   roi   pos_iou_thresh roi.
        pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]
        
        pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))
        if pos_index.size > 0:
            pos_index = np.random.choice(
                pos_index, size=pos_roi_per_this_image, replace=False)
        
        
         #      ,      roi       roi,       
        #     [neg_iou_thresh_lo, neg_iou_thresh_hi]   roi.
        neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &                             (max_iou >= self.neg_iou_thresh_lo))[0]
        neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image
        neg_roi_per_this_image = int(min(neg_roi_per_this_image,
                                         neg_index.size))
        if neg_index.size > 0:
            neg_index = np.random.choice(
                neg_index, size=neg_roi_per_this_image, replace=False)

        #            .
        keep_index = np.append(pos_index, neg_index)
        gt_roi_label = gt_roi_label[keep_index]
        gt_roi_label[pos_roi_per_this_image:] = 0  
        sample_roi = roi[keep_index]

        # roi -> bbox,          
        gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])
        
        #                    
        gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
                       ) / np.array(loc_normalize_std, np.float32))

        return sample_roi, gt_roi_loc, gt_roi_label

손실 함수
Faster RCNN 에 서 는 교차 엔트로피 손실 함 수 를 분류 하여 사용 하고,회귀 에 서 는 SmoothL 1 loss 를 사용 합 니 다.
Smooth L1 오차 함 수 는 네트워크 에서 좌표 오프셋 오 차 를 계산 하 는 데 사용 되 지만 좌표 오프셋 오 차 를 계산 하 는 과정 에서 주의해 야 할 것 이 있 습 니 다.라벨 이 배경 인 경계 상 자 는 포 지 셔 닝 오 차 를 계산 하지 않 습 니 다.Smooth L1 오차 의 수학 형식 은 다음 과 같다.
$$L_{1;smooth}=\begin{cases}|x|\quad &if|x|>\alpha;\\ \frac{1}{|\alpha|}x^2\quad &if |x|\leq \alpha\end{cases}$$
Smooth L1 오 차 를 사용 하 는 이 유 는 다음 과 같 습 니 다.

L1 오 차 는 이상 치(outlier)에 민감 하지 않 으 며,L2 오 차 를 사용 하면 네트워크 훈련 초기 에 후보 상자 의 분포 차이 가 매우 크 고 실제 상자 와 의 차이 가 커서 경사도 가 폭발 할 수 있다

4.567917.입력 이 시간 경도 에 비해 진동 이 비교적 적다
공식 적 으로 는α초 매개 변 수 를 위해 서 는 보통 1 을 취하 여 오차 함 수 를 연속 시 킵 니 다.Smooth L1 오 차 는 L1 오차 와 L2 오 차 를 결합 했다.입력 이 시간 에 비해 L2 오차 로 나타 나 지 않 으 면 L1 오차 로 나타 납 니 다.rpn Loss 계산 과정 에서α취하 다그리고 fast rcnn Loss 계산 과정 에서α취하 다
총결산
이 절 은 faster rcnn 오차 계산 과정 에서 예측 과 실제 일치 하 는 과정,즉 anchor 와 bbox 를 어떻게 대응 하 는 지(Anchor TargetLayer),그리고 roi 와 bbox 를 어떻게 대응 하 는 지(ProposalTargetLayer)를 상세 하 게 소개 하고 그들의 핵심 코드 를 보 여 주 었 다.이 매 칭 과정 을 이해 하면 기본적으로 전체 네트워크 의 오차 전파 과정 을 이해 할 수 있다.오차 의 구체 적 인 계산 은 비교적 이해 하기 쉬 우 므 로 구체 적 인 소 개 는 하지 않 겠 습 니 다.
주의해 야 할 것 은 AnchorTargetLayer 와 ProposalTargetLayer 는 네트워크 의 전방 향 전파 과정 에 만 참여 하고 경사도 계산 이 필요 없다 는 점 이다.
다음 절 에 나 는 faster rcnn 을 어떻게 훈련 시 키 는 지 소개 할 것 이다.
Reference
[1] Region of Interest Pooling
[2] How to interpret smooth l1 loss?
[3] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

로마 숫자를 정수로 또는 그 반대로 변환

그 중 하나는 로마 숫자를 정수로 변환하는 함수를 만드는 것이었고 두 번째는 그 반대를 수행하는 함수를 만드는 것이었습니다. 문자만 포함합니다'I', 'V', 'X', 'L', 'C', 'D', 'M' ; 문자열이 ...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

Android 에 서 는 TTS(TextToSpeech)를 사용 하여 텍스트 를 음성 으로 변환 합 니 다.

Android 감청 네트워크

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다