16. GPU

4344 단어 컴퓨터 구조 컴퓨터 구조

생성일: 2021년 12월 4일 오후 4:15

GPUs are SIMD Engines Underneath

instruction pipeline operates like a SIMD pipeline
However, programming is done using threads, NOT SIMD instructions

How Can You Exploit Parallelism Here?

for (i=0; i < N; i++)
	C[i] = A[i] + B[i];

위의 코드에서 insctruction-level 병렬성을 활용하기 위한 3 가지 옵션

Sequential (SISD)
Data-Parallel (SIMD)
Multithreaded (MIMD/SPMD)

1. Sequential (SISD)

Pipelined processor
Out-of-order execution processor
- Independent instructions executed when ready
- Different iterations are present in the instruction window and can execute in parallel in multiple functional units
- the loop is dynamically unrolled by the hardware
Superscalar or VLIW processor

2. data Parallel (SIMD)

Each iteration is independent
Programmer or compiler generates a SIMD instruction to execute the same instruction from all iterations across different data
Best executed by a SIMD processor (vector, array)

3. Multithreaded

Each iteration is independent
Programmer or compiler generates a thread to execute each iteration. Each thread does the same thing (but on different data)
Can be executed on a MIMD machine
This particular model is also called : SPMD (Single Program Multiple Data)
Can be executed on a SIMT machine (Single Instruction Multiple Thread)

GPU is a SIMD (SIMT) Machine

It is programmed using threads (SPMD programming model)
- Each thread executes the same code but operates a different piece of data
A set of threads executing the same instruction are dynamically grouped into a warp (wavefront) by the hardware

SPMD on SIMT Machine

SIMD vs. SIMT Execution model

SIMD: A single sequential instruction stream of SIMD instructions → each instruction specifies multiple data inputs
- [VLD, VADD, VST], VLEN
SIMT: Multiple instruction streams of scalar instructions → threads grouped dynamically into warps
- [LD, ADD, ST], NumThreads
- 장점
  - Can treat each thread separately
  - Can group threads into warps flexibly

Fine-Grained Multithreading of Warps

Assume a warp consists of 32 threads
If you have 32K iterations, and 1 iteration/thread → 1K warps
Warps can be interleaved(끼워지다) on the same pipeline → Fine grained multithreading of warps

Iter : 3332 + 1 ⇒ 2032 + 1 Iter. 3432 + 2 ⇒ 2032 + 2

Warps and Warp-Level FGMT

Warp: A set of threads that execute the same instruction (on different data elements) → SIMT (Nvidia-speak)
All threads run the same code

Author And Source

이 문제에 관하여(16. GPU), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@lsj8706/16.-GPU

저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

이것이 코딩테스트다 with 파이썬 - Chp 8. 다이나믹 프로그래밍_3. 개미 전사

자주쓰는 Markdown 명령어

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다