RuntimeError: CUDA out of memory.

hh@hh:/code/mmdetection$ sudo CUDA_VISIBLE_DEVICES=0,1,2,3 python3 tools/train.py configs/cascade_rcnn_x101_64x4d_fpn_1x.py --work_dir ./work_dirs/cascade_rcnn_x101_64x4d_fpn_1x/ --validate --gpus 4
2019-09-23 13:24:42,014 - INFO - Distributed training: False
2019-09-23 13:24:45,032 - INFO - load model from: open-mmlab://resnext101_64x4d
loading annotations into memory...
Done (t=19.72s)
creating index...
index created!
2019-09-23 13:25:16,517 - INFO - Start running, host: root@hy, work_dir: /code/mmdetection/work_dirs/cascade_rcnn_x101_64x4d_fpn_1x
2019-09-23 13:25:16,518 - INFO - workflow: [('train', 1)], max: 12 epochs
Traceback (most recent call last):
  File "tools/train.py", line 108, in 
    main()
  File "tools/train.py", line 104, in main
    logger=logger)
  File "/code/mmdetection/mmdet/apis/train.py", line 60, in train_detector
    _non_dist_train(model, dataset, cfg, validate=validate)
  File "code/mmdetection/mmdet/apis/train.py", line 221, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/code/mmcv/mmcv/runner/runner.py", line 358, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/code/mmcv/mmcv/runner/runner.py", line 264, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/code/mmdetection/mmdet/apis/train.py", line 38, in batch_processor
    losses = model(**data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/code/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/code/mmdetection/mmdet/models/detectors/base.py", line 86, in forward
    return self.forward_train(img, img_meta, **kwargs)
  File "/code/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 175, in forward_train
    proposal_list = self.rpn_head.get_bboxes(*proposal_inputs)
  File "/code/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func
    return old_func(*args, **kwargs)
  File "/code/mmdetection/mmdet/models/anchor_heads/anchor_head.py", line 221, in get_bboxes
    scale_factor, cfg, rescale)
  File "/code/mmdetection/mmdet/models/anchor_heads/rpn_head.py", line 71, in get_bboxes_single
    rpn_cls_score = rpn_cls_score.reshape(-1)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 10.76 GiB total capacity; 9.01 GiB already allocated; 3.56 MiB free; 832.69 MiB cached)

좋은 웹페이지 즐겨찾기