16. GPU
생성일: 2021년 12월 4일 오후 4:15
GPUs are SIMD Engines Underneath
- instruction pipeline operates like a SIMD pipeline
- However, programming is done using threads, NOT SIMD instructions
How Can You Exploit Parallelism Here?
for (i=0; i < N; i++)
C[i] = A[i] + B[i];
for (i=0; i < N; i++)
C[i] = A[i] + B[i];
위의 코드에서 insctruction-level 병렬성을 활용하기 위한 3 가지 옵션
- Sequential (SISD)
- Data-Parallel (SIMD)
- Multithreaded (MIMD/SPMD)
1. Sequential (SISD)
- Pipelined processor
- Out-of-order execution processor
- Independent instructions executed when ready
- Different iterations are present in the instruction window and can execute in parallel in multiple functional units
- the loop is dynamically unrolled by the hardware
- Superscalar or VLIW processor
2. data Parallel (SIMD)
- Each iteration is independent
- Programmer or compiler generates a SIMD instruction to execute the same instruction from all iterations across different data
- Best executed by a SIMD processor (vector, array)
3. Multithreaded
- Each iteration is independent
- Programmer or compiler generates a thread to execute each iteration. Each thread does the same thing (but on different data)
- Can be executed on a MIMD machine
- This particular model is also called : SPMD (Single Program Multiple Data)
- Can be executed on a SIMT machine (Single Instruction Multiple Thread)
GPU is a SIMD (SIMT) Machine
- It is programmed using threads (SPMD programming model)
- Each thread executes the same code but operates a different piece of data
- A set of threads executing the same instruction are dynamically grouped into a warp (wavefront) by the hardware
SPMD on SIMT Machine
SIMD vs. SIMT Execution model
- SIMD: A single sequential instruction stream of SIMD instructions → each instruction specifies multiple data inputs
- [VLD, VADD, VST], VLEN
- SIMT: Multiple instruction streams of scalar instructions → threads grouped dynamically into warps
- [LD, ADD, ST], NumThreads
- 장점
- Can treat each thread separately
- Can group threads into warps flexibly
Fine-Grained Multithreading of Warps
- Assume a warp consists of 32 threads
- If you have 32K iterations, and 1 iteration/thread → 1K warps
- Warps can be interleaved(끼워지다) on the same pipeline → Fine grained multithreading of warps
Iter : 3332 + 1 ⇒ 2032 + 1 Iter. 3432 + 2 ⇒ 2032 + 2
Warps and Warp-Level FGMT
- Warp: A set of threads that execute the same instruction (on different data elements) → SIMT (Nvidia-speak)
- All threads run the same code
Author And Source
이 문제에 관하여(16. GPU), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@lsj8706/16.-GPU저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)