TensorFlow 메모리 설정
10431 단어 DP 프레임
원본 코드
message GPUOptions {
// A value between 0 and 1 that indicates what fraction of the
// available GPU memory to pre-allocate for each process. 1 means
// to pre-allocate all of the GPU memory, 0.5 means the process
// allocates ~50% of the available GPU memory.
double per_process_gpu_memory_fraction = 1;
// The type of GPU allocation strategy to use.
//
// Allowed values:
// "": The empty string (default) uses a system-chosen default
// which may change over time.
//
// "BFC": A "Best-fit with coalescing" algorithm, simplified from a
// version of dlmalloc.
string allocator_type = 2;
// Delay deletion of up to this many bytes to reduce the number of
// interactions with gpu driver code. If 0, the system chooses
// a reasonable default (several MBs).
int64 deferred_deletion_bytes = 3;
// If true, the allocator does not pre-allocate the entire specified
// GPU memory region, instead starting small and growing as needed.
bool allow_growth = 4;
// A comma-separated list of GPU ids that determines the 'visible'
// to 'virtual' mapping of GPU devices. For example, if TensorFlow
// can see 8 GPU devices in the process, and one wanted to map
// visible GPU devices 5 and 3 as "/gpu:0", and "/gpu:1", then one
// would specify this field as "5,3". This field is similar in
// spirit to the CUDA_VISIBLE_DEVICES environment variable, except
// it applies to the visible GPU devices in the process.
//
// NOTE: The GPU driver provides the process with the visible GPUs
// in an order which is not guaranteed to have any correlation to
// the *physical* GPU id in the machine. This field is used for
// remapping "visible" to "virtual", which means this operates only
// after the process starts. Users are required to use vendor
// specific mechanisms (e.g., CUDA_VISIBLE_DEVICES) to control the
// physical to visible device mapping prior to invoking TensorFlow.
// GPU ID, GPU 。
string visible_device_list = 5;
// In the event polling loop sleep this many microseconds between
// PollEvents calls, when the queue is not empty. If value is not
// set or set to 0, gets set to a non-zero default.
int32 polling_active_delay_usecs = 6;
// In the event polling loop sleep this many millisconds between
// PollEvents calls, when the queue is empty. If value is not
// set or set to 0, gets set to a non-zero default.
int32 polling_inactive_delay_msecs = 7;
// Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow,
// enabling this option forces all CPU tensors to be allocated with Cuda
// pinned memory. Normally, TensorFlow will infer which tensors should be
// allocated as the pinned memory. But in case where the inference is
// incomplete, this option can significantly speed up the cross-device memory
// copy performance as long as it fits the memory.
// Note that this option is not something that should be
// enabled by default for unknown or very large models, since all Cuda pinned
// memory is unpageable, having too much pinned memory might negatively impact
// the overall host system performance.
bool force_gpu_compatible = 8;
};
사용
import tensorflow as tf
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0, 1' # GPU
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.5 # gpu50%
config.gpu_options.allow_growth = True #
config.gpu_options.allocator_type = 'BFC' # BFC
sess = tf.Session(config = config, ...)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)):
allow_soft_placement tensorflow GPU , CPU 。
log_device_placement 。