park jong hyun

park jong hyun

School of Electrical and Electronic Engineering, Yonsei University, South Korea

๐Ÿ“ Korea, Republic of
Deep Learning ์œ„ํ•ด์„œ๋Š” ์–ด๋–ค GPU๋ฅผ ์‚ฌ์•ผ ํ• ๊นŒ์š”?
GPGPU

Deep Learning ์œ„ํ•ด์„œ๋Š” ์–ด๋–ค GPU๋ฅผ ์‚ฌ์•ผ ํ• ๊นŒ์š”?

๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด deep learning์— ๊ด€์‹ฌ์„ ๊ฐ€์ง€๊ณ , ์ด๋ฅผ ์œ„ํ•ด GPU๋ฅผ ์‚ฌ์„œ ์“ด๋‹ค. GPU computing ๋ฐ architecture๋ฅผ ์ „๊ณตํ•˜๋Š” ์‚ฌ๋žŒ์œผ๋กœ, ๋ฟŒ๋“ฏ(?)ํ•˜๊ธฐ๋„ ํ•˜๋‹ค. ๊ทธ๋ž˜์„œ, deep learning์„ ์œ„ํ•ด GPU๋ฅผ ์‚ฌ๊ณ ์ž ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์„ ์œ„ํ•œ ๊ธ€์„ ์“ด๋‹ค. ์กฐ๊ธˆ์ด๋ผ๋„ ๋„์›€์ด ๋˜์‹œ๊ธธ... (Blog์— ์žˆ๋Š” ๊ธ€๊ณผ ๋‚ด ๊ฐœ์ธ์ ์ธ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ ์ž‘์„ฑํ•จ.) AMD? NVIDIA? NVIDIA ๊ฒƒ์„ ์‚ฌ๋ผ. ์ด๊ฒฌ์˜ ์—ฌ์ง€ ์กฐ์ฐจ์—†๋‹ค.

Pascal - NVIDIA์˜ ์ƒˆ๋กœ์šด GPU architecture ๋ฐœํ‘œ
GPGPU

Pascal - NVIDIA์˜ ์ƒˆ๋กœ์šด GPU architecture ๋ฐœํ‘œ

๋“œ๋””์–ด NVIDIA ์—์„œ ์ƒˆ๋กœ์šด Pascal GPU๋ฅผ ๋ฐœํ‘œํ–ˆ๋‹ค. ์ด๋ฏธ ๋ช‡ ๋‹ฌ์ „์— ์ผ๋ถ€(?) ๊ณต๊ฐœ๊ฐ€ ๋˜์—ˆ๊ณ , ๋ฃจ๋จธ๋“ค๋„ ๋งŽ์•„์„œ ๊ด€์‹ฌ์žˆ๋Š” ์‚ฌ๋žŒ๋“ค์€ ๋ฏธ๋ฆฌ ์ข€ ์•Œ์•˜๊ฒ ์ง€๋งŒ ๊นŒ๋ณด๋‹ˆ ํฅ๋ฏธ๋กœ์šด ๊ฒƒ๋“ค์ด ์ข€ ์žˆ๋‹ค. ๊ฐœ์ธ์ ์œผ๋กœ ๊ฐ€์žฅ ๋†€๋ผ์šด ๊ฒƒ์€ half-precision! (Deep learning ์‹œ์žฅ์„ ์–ด์ง€๊ฐ„ํžˆ๋„ ๋จน๊ณ  ์‹ถ๊ธด ํ•˜๋‚˜๋ณด๋‹ค.) ์ž ๊ทธ๋Ÿผ, ์ฃผ๋ชฉํ• ๋งŒํ•œ ๊ฒƒ๋“ค์„ ์‚ดํŽด๋ณด์ž. Pascal ์ด ๋ญ”๊ฐ€์š”? NVIDIA๋Š” 2000๋…„๋Œ€ ์ค‘๋ฐ˜๋ถ€ํ„ฐ ์ž์‚ฌ์˜ GPU

์ „๋ฌธ๊ฐ€์šฉ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋Š” ๋ญ๊ฐ€ ๋‹ค๋ฅธ๊ฐ€์š”?
GPU

์ „๋ฌธ๊ฐ€์šฉ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋Š” ๋ญ๊ฐ€ ๋‹ค๋ฅธ๊ฐ€์š”?

์ด๋ฒˆ ํฌ์ŠคํŠธ๋Š” ์ „๋ฌธ๊ฐ€์šฉ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ์— ๋Œ€ํ•œ ์„ค๋ช…์„ ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ ๋‹ค๋‚˜์™€์—์„œ "์ „๋ฌธ๊ฐ€์šฉ VGA" ํƒญ์„ ํด๋ฆญํ–ˆ์„ ๋•Œ ๋‚˜์˜ค๋Š” ์ƒํ’ˆ์ธ๋ฐ, ๋ฌด๋ ค 790๋งŒ์›.... ์ด ๋†ˆ๋“ค์€ ๋ญํ•˜๋Š” ๋†ˆ๋“ค์ด๊ธธ๋ž˜ ์ด๋ ‡๊ฒŒ ๋น„์‹ผ์ง€ ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์šฐ์„  GPU ๊ณ„์˜ ์–‘๋Œ€ ์‚ฐ๋งฅ ๋ผ์ธ์—…์—์„œ ์ „๋ฌธ๊ฐ€์šฉ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋ฅผ ์นญํ•˜๋Š” ๋ธŒ๋žœ๋“œ๋ช…๋ถ€ํ„ฐ ์–˜๊ธฐํ•˜๊ณ ์ž ํ•œ๋‹ค. NVIDIA ์—์„œ๋Š” ์ผ๋ฐ˜ ์†Œ๋น„์ž์šฉ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋Š” Geforce ์‹œ๋ฆฌ์ฆˆ(

์•ŒํŒŒ๊ณ (AlphaGo)๊ฐ€ ๋ฐ”๋‘‘ ๋‘๋Š” ๋ฐฉ๋ฒ•
Deep Learning

์•ŒํŒŒ๊ณ (AlphaGo)๊ฐ€ ๋ฐ”๋‘‘ ๋‘๋Š” ๋ฐฉ๋ฒ•

๊ตฌ๊ธ€์˜ ๋”ฅ๋งˆ์ธ๋“œ(DeepMind)์—์„œ ๋งŒ๋“  deep learning ๊ธฐ๋ฐ˜ ๋ฐ”๋‘‘ ํ”„๋กœ๊ทธ๋žจ(?) ์•ŒํŒŒ๊ณ (AlphaGo)๊ฐ€ ์˜ค๋Š” 3์›” 9์ผ ๋ถ€ํ„ฐ 15์ผ๊นŒ์ง€ ์ด์„ธ๋Œ 9๋‹จ๊ณผ ๊ฒฝ๊ธฐ๋ฅผ ๊ฐ€์ง„๋‹ค. ์žฅ์•ˆ์˜ ํ™”์ œ๋ผ ๋‚˜๋„ ๊ถ๊ธˆํ•ด์„œ ํ•œ๋ฒˆ ์ฐพ์•„๋ณด์•˜๊ณ , ์ด๋ฅผ ๊ณต์œ ํ•˜๊ณ ์ž ์ด ๊ธ€์„ ์“ด๋‹ค. ์ด ๊ธ€์˜ ๋‚ด์šฉ์€ Nature์— ์‹ค๋ฆฐ ์•ŒํŒŒ๊ณ  ๋…ผ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘์„ฑํ•˜์˜€๊ณ , ๋‚ด ์ˆ˜์ค€์ด ๋”ธ๋ ค์„œ ์ž˜๋ชป ์ดํ•ดํ•˜๊ณ  ์“ด ๋‚ด์šฉ๋„ ์žˆ์„

GPGPU

STT-RAM for GPU register file

์ด๋ฒˆ์—๋Š” ๊ฐ•์ œ๋กœ ๊ณต๋ถ€ํ•˜๊ฒŒ๋œ ๋…ผ๋ฌธ ๋‚ด์šฉ์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ „ํ˜•์ ์ธ well-organized(?) ๋œ ๋…ผ๋ฌธ ์ธ๊ฑฐ ๊ฐ™์•„์„œ ์ ์–ด๋‘๋ ค๊ณ  ํ•œ๋‹ค. ASP-DAC ์— ๋‚˜์˜จ ๋…ผ๋ฌธ์ธ๋ฐ, ์ž์„ธํ•œ ์ •๋ณด๋Š” ์ง์ ‘ ์ฐพ์•„๋ณด๋ฉด ๋œ๋‹ค. ๋…ผ๋ฌธ ๋งํฌ Main contribution MLC STT-RAM ์œผ๋กœ GPU register file์„ ๊ตฌ์„ฑ (๊ธฐ์กด์—๋Š” SRAM) MLC ํŠน์„ฑ์ƒ ๋ฐœ์ƒํ•˜๋Š” soft-bit ๊ณผ hard-bit์— ์†๋„ ์ฐจ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ, ์ž์ฃผ ์“ฐ์ด๋Š” ๋ฐ์ดํ„ฐ๋Š” soft-bit์— mapping

GPGPU

GPU Virtualization

CPU์ฒ˜๋Ÿผ GPU๋„ virtualization (๊ฐ€์ƒํ™”) ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค!! ๊ฐ„๋‹จํ•˜๊ฒŒ virtualization ์ด ๋ญ”์ง€ ์„ค๋ช…ํ•˜์ž๋ฉด ์‚ฌ์šฉ์ž์—๊ฒŒ ํ•˜๋“œ์›จ์–ด๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ปฅ์น˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ปดํ“จํ„ฐ๊ฐ€ ์ง€๊ธˆ CPU๊ฐ€ 1๊ฐœ ์žˆ๋Š”๋ฐ ์‚ฌ์šฉ์ž A์—๊ฒŒ๋„ CPU 1๊ฐœ ์žˆ๋‹ค๊ณ  ํ•˜๊ณ , ์‚ฌ์šฉ์ž B์—๊ฒŒ๋„ CPU 1๊ฐœ ์žˆ๋‹ค๊ณ  ํ•ด์„œ, ์ด 2๊ฐ€์ง€์˜ ์ผ์„ ๋ฐ›์€ ํ›„ ์‹œ๊ฐ„์„ ์ชผ๊ฐœ์„œ ์‹คํ–‰ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค. 1๊ฐœ์˜ CPU๋ฅผ

CUDA Memory Model
GPGPU

CUDA Memory Model

CUDA ํ”„๋กœ๊ทธ๋žจ์€ ๊ฐ™์€ ์ผ์„ ํ•˜๋”๋ผ๋„ ๊ตฌํ˜„ ๋ฐฉ์‹(์•Œ๊ณ ๋ฆฌ์ฆ˜)์— ๋”ฐ๋ผ ์ฒœ์ฐจ๋งŒ๋ณ„์˜ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ ์ค‘ Memory Model๋ฅผ ๋ชจ๋ฅด๋ฉด ์ •๋ง ํ”„๋กœ๊ทธ๋žจ์ด ํ•œ์ฐธ ๋Š๋ ค์ง„๋‹ค. CUDA Memory Model ์ด๋ฅผ ์œ„ํ•ด ๊ฐ€์žฅ ๋จผ์ € ์•Œ์•„์•ผ ํ•˜๋Š” ๊ฒƒ์ด memory model ์ด๋‹ค. CUDA ์—๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์žˆ๋‹ค. ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ global memory ์™€

GPGPU Simulation - 2๋ถ€
GPGPU

GPGPU Simulation - 2๋ถ€

์ง€๋‚œ๋ฒˆ์— ์†Œ๊ฐœํ•œ GPGPU Simulation ์˜ ์‹ค์ œ ์˜ˆ๋ฅผ ๋ณด์—ฌ์ฃผ๊ฒ ๋‹ค. GPGPU-sim gpgpu-sim ํ™ˆํŽ˜์ด์ง€์— ๊ฐ€๋ณด๋ฉด, ๋ฉ”๋‰ด์–ผ์„ ๋ณด๊ณ  ์ฝ”๋“œ๋ฅผ ๋‹ค์šด ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค. ๋ฆฌ๋ˆ…์Šค์—์„œ ์„ค์น˜ ๋ฐ ์‹คํ–‰์ด ๊ฐ€๋Šฅํ•˜๊ณ , CUDA๋Š” ์ตœ์‹  ๋ฒ„์ „์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค.... ๊ฐ„๋‹จํ•˜๊ฒŒ ๋Œ๋ฆฐ ์˜ˆ๋ฅผ ํ•œ๋ฒˆ ๋ณด์—ฌ๋“œ๋ฆผ. ์ปค๋„์ด ๋๋‚œ ํ›„์˜ ๊ฒฐ๊ณผ์ด๋‹ค. ์ปค๋„์ด ์ด ๋ช‡๊ฐœ์˜ instruction ์ธ์ง€ ๋ช‡ cycle์ด๋‚˜ ๊ฑธ๋ ธ๋Š”์ง€๋ฅผ ๋น„๋กฏํ•˜์—ฌ, ์–ด๋Š ๋ถ€๋ถ„์—์„œ stall

GPGPU Simulation - 1๋ถ€
GPGPU

GPGPU Simulation - 1๋ถ€

Simulation Simulation์„ ์ด์šฉํ•˜๋ฉด GPU์—†์ด CPU๋งŒ์œผ๋กœ๋„ CUDA (OpenCL) ์ฝ”๋“œ๋ฅผ ๋Œ๋ ค๋ณผ ์ˆ˜ ์žˆ๋‹ค. (๋ฌผ๋ก  emulation ๋งŒ์œผ๋กœ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.) CPU ์—์„œ GPU์˜ ๋™์ž‘์„ ์†Œํ”„ํŠธ์›จ์–ด๋กœ ๊ตฌํ˜„ simulator ๋“ค์ด ์žˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ gpgpu-sim ๊ณผ multi2sim ์ด ์žˆ๋‹ค. Verilog๋กœ ๊ตฌํ˜„๋˜์–ด CPU๊ฐ€ ์•„๋‹Œ FPGA์—์„œ GPU๋ฅผ ์ง์ ‘ ๊ตฌ์›Œ๋ณผ์ˆ˜ ์žˆ๋Š” miaowgpu ๋„ ์žˆ๋‹ค. Simulation์˜ ์šฉ๋„ ์ด๋Ÿฌํ•œ simulation์€ ์‚ฌ์‹ค GPU๊ฐ€ ์—†๋Š” ์‚ฌ๋žŒ์„

GPGPU - 2๋ถ€
GPGPU

GPGPU - 2๋ถ€

GPGPU๋ž€?? - 2๋ถ€ GPGPU๋ฅผ ์œ„ํ•œ GPU ๊ตฌ์กฐ ์ง€๋‚œ 1๋ถ€์—์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ ๊ณผ ๊ฐ™์ด GPU๋Š” ๊ทธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด์ด๊ณ  ๊ทธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ๋Š” ๋Œ€๋Ÿ‰์˜ data-level-parallelism ์„ ๊ฐ€์ง„๋‹ค. ๋”ฐ๋ผ์„œ, ๊ธฐ๋ณธ์ ์œผ๋กœ SIMD ํ˜•ํƒœ์˜ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค. (SIMD = Single Instruction Multiple Data) ์œ„ ๊ทธ๋ฆผ ์ฒ˜๋Ÿผ ํ•˜๋‚˜์˜ instruction์„ ์—ฌ๋Ÿฌ๊ฐœ์˜ ALU๊ฐ€ ๋™์‹œ์— ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ ํ•˜๋Š” ๊ฒƒ์„ SIMD ๋ผ๊ณ 

GPGPU - 1๋ถ€
GPGPU

GPGPU - 1๋ถ€

GPGPU (General Purpose computation on GPU) ๋ž€?? GPU ๋ถ€ํ„ฐ ์•Œ์•„๋ณด์ž (๊ฐ„๋‹จํ•˜๊ฒŒ) GPU (Graphics Processing Unit) ์€ ๋‹ค๋“ค ์•Œ๊ฒ ์ง€๋งŒ Graphics ์—ฐ์‚ฐ์„ ์œ„ํ•œ ์ „์šฉ ํ•˜๋“œ์›จ์–ด์ด๋‹ค. ๋ณดํ†ต ์™ธ์žฅ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ์— ๋‹ฌ๋ฆฐ ๊ฐ€์šด๋ฐ ํฐ ์นฉ์ด๋‹ค. (๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋Š” GPU๊ฐ€ ์•„๋‹ˆ๋ผ GPU์™€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฐ™์ด ๋‹ฌ๋ฆฐ ๋ณด๋“œ์ž„.) ์š”์ฆˆ์Œ์—๋Š” CPU์—๋„ ๋‚ด์žฅ GPU๊ฐ€ ๊ฐ™์ด ๋‹ฌ๋ ค๋‚˜์˜จ๋‹ค. Intel Core CPU ๊ณ„์—ด์—๋Š” HD graphics

Instruction-Level Parallelism (ILP)

Instruction-Level Parallelism (ILP) and Its Exploitation ILP: concepts and Challenges ILP -> hardware, software ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋Œ์–ด๋‚ธ๋‹ค. CPI < 1 branch ์™€ ๋‹ค์Œ branch ์‚ฌ์ด์˜ instruction ๋“ค์„ basic block ์ด๋ผ๊ณ  ํ•จ. ํ•œ basic block ์•ˆ์—์„œ ILP๋ฅผ ๋Œ์–ด๋‚ด๋Š” ๊ฒƒ์€ ํ•œ๊ณ„๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ๊ฐœ์˜ basic block ์—์„œ ILP๋ฅผ ์ด๋Œ์–ด๋‚ด์•ผํ•จ. ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ