CUDA 학습 노트 3

장치 관리
NVIDIA 는 GPU device 를 조회 하고 관리 하 는 데 집중 적 으로 제공 합 니 다.GPU 정보 조 회 를 파악 하 는 것 이 중요 합 니 다.커 널 의 실행 설정 을 설정 하 는 데 도움 이 되 기 때 문 입 니 다.
본 박문 은 주로 다음 과 같은 두 가지 내용 을 소개 할 것 이다.

CUDA runtime API function

NVIDIA 시스템 관리 명령 행
runtime API 를 사용 하여 GPU 정 보 를 조회 합 니 다.
아래 function 을 사용 하여 GPU device 에 대한 모든 정 보 를 조회 할 수 있 습 니 다.
cudaError_t cudaGetDeviceProperties(cudaDeviceProp *prop, int device);
GPU 의 정 보 는 cudaDeviceProp 이라는 구조 체 에 놓 여 있 습 니 다.
코드

#include 
#include 
int main(int argc, char **argv) {    
　　printf("%s Starting...
", argv[0]);
    int deviceCount = 0;
    cudaError_t error_id = cudaGetDeviceCount(&deviceCount);
    if (error_id != cudaSuccess) {
        printf("cudaGetDeviceCount returned %d
-> %s
",
        (int)error_id, cudaGetErrorString(error_id));
        printf("Result = FAIL
");
        exit(EXIT_FAILURE);
    }
    if (deviceCount == 0) {
        printf("There are no available device(s) that support CUDA
");
    } else {
        printf("Detected %d CUDA Capable device(s)
", deviceCount);
    }

    int dev, driverVersion = 0, runtimeVersion = 0;
    dev =0;
    cudaSetDevice(dev);
    cudaDeviceProp deviceProp;
    cudaGetDeviceProperties(&deviceProp, dev);
    printf("Device %d: \"%s\"
", dev, deviceProp.name);
    cudaDriverGetVersion(&driverVersion);
    cudaRuntimeGetVersion(&runtimeVersion);
    printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d
",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10);
    printf(" CUDA Capability Major/Minor version number: %d.%d
",deviceProp.major, deviceProp.minor);
    printf(" Total amount of global memory: %.2f MBytes (%llu bytes)
",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem);
    printf(" GPU Clock rate: %.0f MHz (%0.2f GHz)
",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);
    printf(" Memory Clock rate: %.0f Mhz
",deviceProp.memoryClockRate * 1e-3f);
    printf(" Memory Bus Width: %d-bit
",deviceProp.memoryBusWidth);
    if (deviceProp.l2CacheSize) {
        printf(" L2 Cache Size: %d bytes
",
        deviceProp.l2CacheSize);
    }

    printf(" Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d)
",
    deviceProp.maxTexture1D , deviceProp.maxTexture2D[0],
    deviceProp.maxTexture2D[1],
    deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1],
    deviceProp.maxTexture3D[2]);

    printf(" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d
",
    deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1],
    deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1],
    deviceProp.maxTexture2DLayered[2]);

    printf(" Total amount of constant memory: %lu bytes
",deviceProp.totalConstMem);
    printf(" Total amount of shared memory per block: %lu bytes
",deviceProp.sharedMemPerBlock);
    printf(" Total number of registers available per block: %d
",deviceProp.regsPerBlock);
    printf(" Warp size: %d
", deviceProp.warpSize);
    printf(" Maximum number of threads per multiprocessor: %d
",deviceProp.maxThreadsPerMultiProcessor);
    printf(" Maximum number of threads per block: %d
",deviceProp.maxThreadsPerBlock);

    printf(" Maximum sizes of each dimension of a block: %d x %d x %d
",
    deviceProp.maxThreadsDim[0],
    deviceProp.maxThreadsDim[1],
    deviceProp.maxThreadsDim[2]);

    printf(" Maximum sizes of each dimension of a grid: %d x %d x %d
",
    deviceProp.maxGridSize[0],
    deviceProp.maxGridSize[1],
    deviceProp.maxGridSize[2]);

    printf(" Maximum memory pitch: %lu bytes
", deviceProp.memPitch);

    exit(EXIT_SUCCESS);
}

컴 파일 실행:
출력:

$ nvcc checkDeviceInfor.cu -o checkDeviceInfor
$ ./checkDeviceInfor

최 적 GPU 결정
다 중 GPU 를 지원 하 는 시스템 에 대해 서 는 그 중에서 하 나 를 우리 의 device 로 선택해 야 합 니 다.가장 좋 은 컴 퓨 팅 성능 GPU 를 선택 하 는 방법 은 가지 고 있 는 프로세서 의 수량 에 의 해 결정 되 며,아래 코드 로 가장 좋 은 GPU 를 선택 할 수 있 습 니 다.

./checkDeviceInfor Starting...
Detected 2 CUDA Capable device(s)
Device 0: "Tesla M2070"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 5.25 MBytes (5636554752 bytes)
GPU Clock rate: 1147 MHz (1.15 GHz)
Memory Clock rate: 1566 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes

nvidia-smi 를 사용 하여 GPU 정 보 를 조회 합 니 다.
nvidia-smi 는 명령 행 도구 로 GPU device 를 관리 하 는 데 도움 을 주 고 device 상 태 를 조회 하고 변경 할 수 있 습 니 다.
nvidia-smi 는 많은 용도 가 있 습 니 다.예 를 들 어 아래 의 명령:

int numDevices = 0;
cudaGetDeviceCount(&numDevices);
if (numDevices > 1) {
    int maxMultiprocessors = 0, maxDevice = 0;
    for (int device=0; device

그리고 아래 명령 을 사용 하여 GPU 0 의 상세 한 정 보 를 조회 할 수 있 습 니 다.

$ nvidia-smi -L
GPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7)
GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)

다음은 이 명령 의 일부 매개 변수 입 니 다.nvidia-smi 의 디 스 플레이 정 보 를 간소화 할 수 있 습 니 다.
MEMORY
UTILIZATION
ECC
TEMPERATURE
POWER
CLOCK
COMPUTE
PIDS
PERFORMANCE
SUPPORTED_CLOCKS
PAGE_RETIREMENT
ACCOUNTING
예 를 들 어 device memory 만 표시 하 는 정보:

$nvidia-smi –q –i 0

장치 설정
다 중 GPU 시스템 에 대해 nvidia-smi 를 사용 하면 각 GPU 속성 을 볼 수 있 으 며,각 GPU 는 0 부터 차례대로 표시 되 며,환경 변수 CUDA 를 사용 합 니 다.VISIBLE_DEVICES 는 애플 리 케 이 션 을 수정 하지 않 고 GPU 를 지정 할 수 있 습 니 다.
환경 변수 CUDA 설정 가능VISIBLE_DEVICES-2 는 GPU 2 만 사용 할 수 있 도록 다른 GPU 를 차단 합 니 다.물론 CUDA 도 사용 할 수 있 습 니 다.VISIBLE_DEVICES-2,3 은 여러 개의 GPU 를 설정 하 는데 그들의 device ID 는 각각 0 과 1 이다.

CUDA 학습 노트 3

좋은 웹페이지 즐겨찾기