TensorFlow 메모리 설정

10431 단어 DP 프레임

원본 코드

message GPUOptions {
  // A value between 0 and 1 that indicates what fraction of the
  // available GPU memory to pre-allocate for each process.  1 means
  // to pre-allocate all of the GPU memory, 0.5 means the process
  // allocates ~50% of the available GPU memory.
  double per_process_gpu_memory_fraction = 1;

  // The type of GPU allocation strategy to use.
  //
  // Allowed values:
  // "": The empty string (default) uses a system-chosen default
  //     which may change over time.
  //
  // "BFC": A "Best-fit with coalescing" algorithm, simplified from a
  //        version of dlmalloc.
  string allocator_type = 2;

  // Delay deletion of up to this many bytes to reduce the number of
  // interactions with gpu driver code.  If 0, the system chooses
  // a reasonable default (several MBs).
  int64 deferred_deletion_bytes = 3;

  // If true, the allocator does not pre-allocate the entire specified
  // GPU memory region, instead starting small and growing as needed.
  bool allow_growth = 4;

  // A comma-separated list of GPU ids that determines the 'visible'
  // to 'virtual' mapping of GPU devices.  For example, if TensorFlow
  // can see 8 GPU devices in the process, and one wanted to map
  // visible GPU devices 5 and 3 as "/gpu:0", and "/gpu:1", then one
  // would specify this field as "5,3".  This field is similar in
  // spirit to the CUDA_VISIBLE_DEVICES environment variable, except
  // it applies to the visible GPU devices in the process.
  //
  // NOTE: The GPU driver provides the process with the visible GPUs
  // in an order which is not guaranteed to have any correlation to
  // the *physical* GPU id in the machine.  This field is used for
  // remapping "visible" to "virtual", which means this operates only
  // after the process starts.  Users are required to use vendor
  // specific mechanisms (e.g., CUDA_VISIBLE_DEVICES) to control the
  // physical to visible device mapping prior to invoking TensorFlow.
  //          GPU ID,        GPU   。
  string visible_device_list = 5;

  // In the event polling loop sleep this many microseconds between
  // PollEvents calls, when the queue is not empty.  If value is not
  // set or set to 0, gets set to a non-zero default.
  int32 polling_active_delay_usecs = 6;

  // In the event polling loop sleep this many millisconds between
  // PollEvents calls, when the queue is empty.  If value is not
  // set or set to 0, gets set to a non-zero default.
  int32 polling_inactive_delay_msecs = 7;

  // Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow,
  // enabling this option forces all CPU tensors to be allocated with Cuda
  // pinned memory. Normally, TensorFlow will infer which tensors should be
  // allocated as the pinned memory. But in case where the inference is
  // incomplete, this option can significantly speed up the cross-device memory
  // copy performance as long as it fits the memory.
  // Note that this option is not something that should be
  // enabled by default for unknown or very large models, since all Cuda pinned
  // memory is unpageable, having too much pinned memory might negatively impact
  // the overall host system performance.
  bool force_gpu_compatible = 8;
};

사용

  • By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.
  • In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two Config options on the Session to control this.
  • The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.
  • The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated.

  • bfc 알고리즘 사용, 알고리즘 상세 정보http://blog.csdn.net/qq_33096883/article/details/76598786.
  • 샘플 코드
  • import tensorflow as tf  
    import os  
    os.environ["CUDA_VISIBLE_DEVICES"] = '0, 1'        #     GPU    
    config = tf.ConfigProto()  
    config.gpu_options.per_process_gpu_memory_fraction = 0.5  #           gpu50%     
    config.gpu_options.allow_growth = True      #          
    config.gpu_options.allocator_type = 'BFC'   #  BFC  
    sess = tf.Session(config = config, ...)
    
    
    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)):   
      allow_soft_placement  tensorflow     GPU     ,     CPU  。
    log_device_placement       。     

    좋은 웹페이지 즐겨찾기