16. GPU

생성일: 2021년 12월 4일 오후 4:15

GPUs are SIMD Engines Underneath

  • instruction pipeline operates like a SIMD pipeline
  • However, programming is done using threads, NOT SIMD instructions

How Can You Exploit Parallelism Here?

for (i=0; i < N; i++)
	C[i] = A[i] + B[i];

위의 코드에서 insctruction-level 병렬성을 활용하기 위한 3 가지 옵션

  1. Sequential (SISD)
  2. Data-Parallel (SIMD)
  3. Multithreaded (MIMD/SPMD)

1. Sequential (SISD)

  • Pipelined processor
  • Out-of-order execution processor
    • Independent instructions executed when ready
    • Different iterations are present in the instruction window and can execute in parallel in multiple functional units
    • the loop is dynamically unrolled by the hardware
  • Superscalar or VLIW processor

2. data Parallel (SIMD)

  • Each iteration is independent
  • Programmer or compiler generates a SIMD instruction to execute the same instruction from all iterations across different data
  • Best executed by a SIMD processor (vector, array)

3. Multithreaded

  • Each iteration is independent
  • Programmer or compiler generates a thread to execute each iteration. Each thread does the same thing (but on different data)
  • Can be executed on a MIMD machine
  • This particular model is also called : SPMD (Single Program Multiple Data)
  • Can be executed on a SIMT machine (Single Instruction Multiple Thread)

GPU is a SIMD (SIMT) Machine

  • It is programmed using threads (SPMD programming model)
    • Each thread executes the same code but operates a different piece of data
  • A set of threads executing the same instruction are dynamically grouped into a warp (wavefront) by the hardware

SPMD on SIMT Machine

SIMD vs. SIMT Execution model

  • SIMD: A single sequential instruction stream of SIMD instructions → each instruction specifies multiple data inputs
    • [VLD, VADD, VST], VLEN
  • SIMT: Multiple instruction streams of scalar instructions → threads grouped dynamically into warps
    • [LD, ADD, ST], NumThreads
    • 장점
      • Can treat each thread separately
      • Can group threads into warps flexibly

Fine-Grained Multithreading of Warps

  • Assume a warp consists of 32 threads
  • If you have 32K iterations, and 1 iteration/thread → 1K warps
  • Warps can be interleaved(끼워지다) on the same pipeline → Fine grained multithreading of warps

Iter : 3332 + 1 ⇒ 2032 + 1 Iter. 3432 + 2 ⇒ 2032 + 2

Warps and Warp-Level FGMT

  • Warp: A set of threads that execute the same instruction (on different data elements) → SIMT (Nvidia-speak)
  • All threads run the same code

좋은 웹페이지 즐겨찾기