slim의batchnorm 문제

13655 단어 deep-learning
python 코드의 장점은 쉽게 작성할 수 있다는 것이다.그러나 그것의 나쁜 점도 매우 커서 읽기가 매우 어렵다!!!
다음 코드는 FastMaskRCNN(https://github.com/CharlesShang/FastMaskRCNN), 실제 운행 과정에서 istraining이 True에서 False로 바뀌면서 테스트 결과가 많이 달라졌어요!며칠을 들볶았다.나중에 해결책을 찾았어요.목표를 resnet에 잠그기v1 함수상.
코드 컨텐트 지정(ResNet v1 모델 생성기)

def resnet_v1(inputs,
              blocks,
              num_classes=None,
              is_training=True,
              global_pool=True,
              output_stride=None,
              include_root_block=True,
              spatial_squeeze=True,
              reuse=None,
              scope=None):
  """Generator for v1 ResNet models.

  This function generates a family of ResNet v1 models. See the resnet_v1_*()
  methods for specific model instantiations, obtained by selecting different
  block instantiations that produce ResNets of various depths.

  Training for image classification on Imagenet is usually done with [224, 224]
  inputs, resulting in [7, 7] feature maps at the output of the last ResNet
  block for the ResNets defined in [1] that have nominal stride equal to 32.
  However, for dense prediction tasks we advise that one uses inputs with
  spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
  this case the feature maps at the ResNet output will have spatial shape
  [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
  and corners exactly aligned with the input image corners, which greatly
  facilitates alignment of the features to the image. Using as input [225, 225]
  images results in [8, 8] feature maps at the output of the last ResNet block.

  For dense prediction tasks, the ResNet needs to run in fully-convolutional
  (FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
  have nominal stride equal to 32 and a good choice in FCN mode is to use
  output_stride=16 in order to increase the density of the computed features at
  small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.

  Args:
    inputs: A tensor of size [batch, height_in, width_in, channels].
    blocks: A list of length equal to the number of ResNet blocks. Each element
      is a resnet_utils.Block object describing the units in the block.
    num_classes: Number of predicted classes for classification tasks. If None
      we return the features before the logit layer.
    is_training: whether is training or not.
    global_pool: If True, we perform global average pooling before computing the
      logits. Set to True for image classification, False for dense prediction.
    output_stride: If None, then the output will be computed at the nominal
      network stride. If output_stride is not None, it specifies the requested
      ratio of input to output spatial resolution.
    include_root_block: If True, include the initial convolution followed by
      max-pooling, if False excludes it.
    spatial_squeeze: if True, logits is of shape [B, C], if false logits is
        of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.
  Returns:
    net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
      If global_pool is False, then height_out and width_out are reduced by a
      factor of output_stride compared to the respective height_in and width_in,
      else both height_out and width_out equal one. If num_classes is None, then
      net is the output of the last ResNet block, potentially after global
      average pooling. If num_classes is not None, net contains the pre-softmax
      activations.
    end_points: A dictionary from components of the network to the corresponding
      activation.

  Raises:
    ValueError: If the target output_stride is not valid.
  """
  with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
    end_points_collection = sc.name + '_end_points'
    with slim.arg_scope([slim.conv2d, bottleneck,
                         resnet_utils.stack_blocks_dense],
                        outputs_collections=end_points_collection):
      with slim.arg_scope([slim.batch_norm], is_training=True):
        net = inputs
        if include_root_block:
          if output_stride is not None:
            if output_stride % 4 != 0:
              raise ValueError('The output_stride needs to be a multiple of 4.')
            output_stride /= 4
          net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
        net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
        if global_pool:
          # Global average pooling.
          net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
        if num_classes is not None:
          net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                            normalizer_fn=None, scope='logits')
        if spatial_squeeze:
          logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
        # Convert end_points_collection into a dictionary of end_points.
        end_points = slim.utils.convert_collection_to_dict(end_points_collection)
        if num_classes is not None:
          end_points['predictions'] = slim.softmax(logits, scope='predictions')
        return logits, end_points
resnet_v1.default_image_size = 224

이 함수부터 해결하고 싶어서 분석을 진행한다.그러나 일정한 단계를 분석한 결과 최종적으로slim의 원본 코드에 위치하게 되었다. 우리가 제어할 수 있는 것이 아니라slim 원본 코드에 문제가 생겼다는 것을 의미하는 것이 아닐까?
1、먼저 istraining 제어 범위
자료 조회를 통해layer의 함수는batchnorm과dropout은 istraining 인수입니다.resnet_utils도 istraining 이 매개 변수는 제어합니다.
with slim.arg_scope([slim.batch norm],is training=True):layer 함수에 많은 기본값을 제공하는데 구체적으로slim에 대한 것입니다.batch_norm 기본값 is 설정training=True.(http://blog.csdn.net/weixin_35653315/article/details/78160886ARg 참조scope의 사용 설명)
2. 몇 가지 중요한 함수 더 보기
#same padding 2-D convolution
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')

net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits')
logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
end_points['predictions'] = slim.softmax(logits, scope='predictions')

이 함수들은 모두 큰 관계가 없다.
3、그런데 왜 이런 상황이 생겼을까?
dropout의 의미를 살펴보자.dropout은 네트워크를 깊이 있게 학습하는 훈련 과정에서 신경 네트워크 단원에 대해 일정한 확률에 따라 일시적으로 네트워크에서 버리는 것을 말한다.근데 dropout은 이거랑 별로 상관이 없을 거예요.
또 봐요batchnorm.
#batch_norm    
    is_training: Whether or not the layer is in training mode. In training mode
      it would accumulate the statistics of the moments into `moving_mean` and
      `moving_variance` using an exponential moving average with the given
      `decay`. When it is not in training mode then it would use the values of
      the `moving_mean` and the `moving_variance`.

이상은tensorflow 원본에서 찾았습니다.
마지막http://blog.csdn.net/cyiano/article/details/75006883박문 뒤에 관련 서술을 보았는데 묘사된 상황이 그다지 차이가 많지 않다는 것을 발견하였다.같은 문제가 생긴 것 같습니다.그러나 이 블로그도 너무 자세하게 말하지 않았기 때문에 슬림의 코드를 깊이 이해해야 할 것 같다.
with slim.arg_scope([slim.batch norm], is training=True)의 istraining이false이면 상술한 블로그의 문제와 같다.현재 나의 해결 방안은 with slim만arg_scope([slim.batch norm], is training=True)의 istraining이 true로 설정되어 있으며 결과는 정상입니다.
근데 내 코드 내용은 테스트야!!!
weird!!!

좋은 웹페이지 즐겨찾기