park jong hyun

park jong hyun

School of Electrical and Electronic Engineering, Yonsei University, South Korea

๐Ÿ“ Korea, Republic of
์•ŒํŒŒ๊ณ (AlphaGo)๊ฐ€ ๋ฐ”๋‘‘ ๋‘๋Š” ๋ฐฉ๋ฒ•
Deep Learning

์•ŒํŒŒ๊ณ (AlphaGo)๊ฐ€ ๋ฐ”๋‘‘ ๋‘๋Š” ๋ฐฉ๋ฒ•

๊ตฌ๊ธ€์˜ ๋”ฅ๋งˆ์ธ๋“œ(DeepMind)์—์„œ ๋งŒ๋“  deep learning ๊ธฐ๋ฐ˜ ๋ฐ”๋‘‘ ํ”„๋กœ๊ทธ๋žจ(?) ์•ŒํŒŒ๊ณ (AlphaGo)๊ฐ€ ์˜ค๋Š” 3์›” 9์ผ ๋ถ€ํ„ฐ 15์ผ๊นŒ์ง€ ์ด์„ธ๋Œ 9๋‹จ๊ณผ ๊ฒฝ๊ธฐ๋ฅผ ๊ฐ€์ง„๋‹ค. ์žฅ์•ˆ์˜ ํ™”์ œ๋ผ ๋‚˜๋„ ๊ถ๊ธˆํ•ด์„œ ํ•œ๋ฒˆ ์ฐพ์•„๋ณด์•˜๊ณ , ์ด๋ฅผ ๊ณต์œ ํ•˜๊ณ ์ž ์ด ๊ธ€์„ ์“ด๋‹ค. ์ด ๊ธ€์˜ ๋‚ด์šฉ์€ Nature์— ์‹ค๋ฆฐ ์•ŒํŒŒ๊ณ  ๋…ผ๋ฌธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘์„ฑํ•˜์˜€๊ณ , ๋‚ด ์ˆ˜์ค€์ด ๋”ธ๋ ค์„œ ์ž˜๋ชป ์ดํ•ดํ•˜๊ณ  ์“ด ๋‚ด์šฉ๋„ ์žˆ์„

GPGPU

STT-RAM for GPU register file

์ด๋ฒˆ์—๋Š” ๊ฐ•์ œ๋กœ ๊ณต๋ถ€ํ•˜๊ฒŒ๋œ ๋…ผ๋ฌธ ๋‚ด์šฉ์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ „ํ˜•์ ์ธ well-organized(?) ๋œ ๋…ผ๋ฌธ ์ธ๊ฑฐ ๊ฐ™์•„์„œ ์ ์–ด๋‘๋ ค๊ณ  ํ•œ๋‹ค. ASP-DAC ์— ๋‚˜์˜จ ๋…ผ๋ฌธ์ธ๋ฐ, ์ž์„ธํ•œ ์ •๋ณด๋Š” ์ง์ ‘ ์ฐพ์•„๋ณด๋ฉด ๋œ๋‹ค. ๋…ผ๋ฌธ ๋งํฌ Main contribution MLC STT-RAM ์œผ๋กœ GPU register file์„ ๊ตฌ์„ฑ (๊ธฐ์กด์—๋Š” SRAM) MLC ํŠน์„ฑ์ƒ ๋ฐœ์ƒํ•˜๋Š” soft-bit ๊ณผ hard-bit์— ์†๋„ ์ฐจ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ, ์ž์ฃผ ์“ฐ์ด๋Š” ๋ฐ์ดํ„ฐ๋Š” soft-bit์— mapping

GPGPU

GPU Virtualization

CPU์ฒ˜๋Ÿผ GPU๋„ virtualization (๊ฐ€์ƒํ™”) ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค!! ๊ฐ„๋‹จํ•˜๊ฒŒ virtualization ์ด ๋ญ”์ง€ ์„ค๋ช…ํ•˜์ž๋ฉด ์‚ฌ์šฉ์ž์—๊ฒŒ ํ•˜๋“œ์›จ์–ด๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ปฅ์น˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ปดํ“จํ„ฐ๊ฐ€ ์ง€๊ธˆ CPU๊ฐ€ 1๊ฐœ ์žˆ๋Š”๋ฐ ์‚ฌ์šฉ์ž A์—๊ฒŒ๋„ CPU 1๊ฐœ ์žˆ๋‹ค๊ณ  ํ•˜๊ณ , ์‚ฌ์šฉ์ž B์—๊ฒŒ๋„ CPU 1๊ฐœ ์žˆ๋‹ค๊ณ  ํ•ด์„œ, ์ด 2๊ฐ€์ง€์˜ ์ผ์„ ๋ฐ›์€ ํ›„ ์‹œ๊ฐ„์„ ์ชผ๊ฐœ์„œ ์‹คํ–‰ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค. 1๊ฐœ์˜ CPU๋ฅผ

CUDA Memory Model
GPGPU

CUDA Memory Model

CUDA ํ”„๋กœ๊ทธ๋žจ์€ ๊ฐ™์€ ์ผ์„ ํ•˜๋”๋ผ๋„ ๊ตฌํ˜„ ๋ฐฉ์‹(์•Œ๊ณ ๋ฆฌ์ฆ˜)์— ๋”ฐ๋ผ ์ฒœ์ฐจ๋งŒ๋ณ„์˜ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ ์ค‘ Memory Model๋ฅผ ๋ชจ๋ฅด๋ฉด ์ •๋ง ํ”„๋กœ๊ทธ๋žจ์ด ํ•œ์ฐธ ๋Š๋ ค์ง„๋‹ค. CUDA Memory Model ์ด๋ฅผ ์œ„ํ•ด ๊ฐ€์žฅ ๋จผ์ € ์•Œ์•„์•ผ ํ•˜๋Š” ๊ฒƒ์ด memory model ์ด๋‹ค. CUDA ์—๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์žˆ๋‹ค. ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ global memory ์™€

GPGPU Simulation - 2๋ถ€
GPGPU

GPGPU Simulation - 2๋ถ€

์ง€๋‚œ๋ฒˆ์— ์†Œ๊ฐœํ•œ GPGPU Simulation ์˜ ์‹ค์ œ ์˜ˆ๋ฅผ ๋ณด์—ฌ์ฃผ๊ฒ ๋‹ค. GPGPU-sim gpgpu-sim ํ™ˆํŽ˜์ด์ง€์— ๊ฐ€๋ณด๋ฉด, ๋ฉ”๋‰ด์–ผ์„ ๋ณด๊ณ  ์ฝ”๋“œ๋ฅผ ๋‹ค์šด ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค. ๋ฆฌ๋ˆ…์Šค์—์„œ ์„ค์น˜ ๋ฐ ์‹คํ–‰์ด ๊ฐ€๋Šฅํ•˜๊ณ , CUDA๋Š” ์ตœ์‹  ๋ฒ„์ „์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค.... ๊ฐ„๋‹จํ•˜๊ฒŒ ๋Œ๋ฆฐ ์˜ˆ๋ฅผ ํ•œ๋ฒˆ ๋ณด์—ฌ๋“œ๋ฆผ. ์ปค๋„์ด ๋๋‚œ ํ›„์˜ ๊ฒฐ๊ณผ์ด๋‹ค. ์ปค๋„์ด ์ด ๋ช‡๊ฐœ์˜ instruction ์ธ์ง€ ๋ช‡ cycle์ด๋‚˜ ๊ฑธ๋ ธ๋Š”์ง€๋ฅผ ๋น„๋กฏํ•˜์—ฌ, ์–ด๋Š ๋ถ€๋ถ„์—์„œ stall

GPGPU Simulation - 1๋ถ€
GPGPU

GPGPU Simulation - 1๋ถ€

Simulation Simulation์„ ์ด์šฉํ•˜๋ฉด GPU์—†์ด CPU๋งŒ์œผ๋กœ๋„ CUDA (OpenCL) ์ฝ”๋“œ๋ฅผ ๋Œ๋ ค๋ณผ ์ˆ˜ ์žˆ๋‹ค. (๋ฌผ๋ก  emulation ๋งŒ์œผ๋กœ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.) CPU ์—์„œ GPU์˜ ๋™์ž‘์„ ์†Œํ”„ํŠธ์›จ์–ด๋กœ ๊ตฌํ˜„ simulator ๋“ค์ด ์žˆ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ gpgpu-sim ๊ณผ multi2sim ์ด ์žˆ๋‹ค. Verilog๋กœ ๊ตฌํ˜„๋˜์–ด CPU๊ฐ€ ์•„๋‹Œ FPGA์—์„œ GPU๋ฅผ ์ง์ ‘ ๊ตฌ์›Œ๋ณผ์ˆ˜ ์žˆ๋Š” miaowgpu ๋„ ์žˆ๋‹ค. Simulation์˜ ์šฉ๋„ ์ด๋Ÿฌํ•œ simulation์€ ์‚ฌ์‹ค GPU๊ฐ€ ์—†๋Š” ์‚ฌ๋žŒ์„

GPGPU - 2๋ถ€
GPGPU

GPGPU - 2๋ถ€

GPGPU๋ž€?? - 2๋ถ€ GPGPU๋ฅผ ์œ„ํ•œ GPU ๊ตฌ์กฐ ์ง€๋‚œ 1๋ถ€์—์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ ๊ณผ ๊ฐ™์ด GPU๋Š” ๊ทธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด์ด๊ณ  ๊ทธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ๋Š” ๋Œ€๋Ÿ‰์˜ data-level-parallelism ์„ ๊ฐ€์ง„๋‹ค. ๋”ฐ๋ผ์„œ, ๊ธฐ๋ณธ์ ์œผ๋กœ SIMD ํ˜•ํƒœ์˜ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค. (SIMD = Single Instruction Multiple Data) ์œ„ ๊ทธ๋ฆผ ์ฒ˜๋Ÿผ ํ•˜๋‚˜์˜ instruction์„ ์—ฌ๋Ÿฌ๊ฐœ์˜ ALU๊ฐ€ ๋™์‹œ์— ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ ํ•˜๋Š” ๊ฒƒ์„ SIMD ๋ผ๊ณ 

GPGPU - 1๋ถ€
GPGPU

GPGPU - 1๋ถ€

GPGPU (General Purpose computation on GPU) ๋ž€?? GPU ๋ถ€ํ„ฐ ์•Œ์•„๋ณด์ž (๊ฐ„๋‹จํ•˜๊ฒŒ) GPU (Graphics Processing Unit) ์€ ๋‹ค๋“ค ์•Œ๊ฒ ์ง€๋งŒ Graphics ์—ฐ์‚ฐ์„ ์œ„ํ•œ ์ „์šฉ ํ•˜๋“œ์›จ์–ด์ด๋‹ค. ๋ณดํ†ต ์™ธ์žฅ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ์— ๋‹ฌ๋ฆฐ ๊ฐ€์šด๋ฐ ํฐ ์นฉ์ด๋‹ค. (๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋Š” GPU๊ฐ€ ์•„๋‹ˆ๋ผ GPU์™€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฐ™์ด ๋‹ฌ๋ฆฐ ๋ณด๋“œ์ž„.) ์š”์ฆˆ์Œ์—๋Š” CPU์—๋„ ๋‚ด์žฅ GPU๊ฐ€ ๊ฐ™์ด ๋‹ฌ๋ ค๋‚˜์˜จ๋‹ค. Intel Core CPU ๊ณ„์—ด์—๋Š” HD graphics

Instruction-Level Parallelism (ILP)

Instruction-Level Parallelism (ILP) and Its Exploitation ILP: concepts and Challenges ILP -> hardware, software ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋Œ์–ด๋‚ธ๋‹ค. CPI < 1 branch ์™€ ๋‹ค์Œ branch ์‚ฌ์ด์˜ instruction ๋“ค์„ basic block ์ด๋ผ๊ณ  ํ•จ. ํ•œ basic block ์•ˆ์—์„œ ILP๋ฅผ ๋Œ์–ด๋‚ด๋Š” ๊ฒƒ์€ ํ•œ๊ณ„๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ๊ฐœ์˜ basic block ์—์„œ ILP๋ฅผ ์ด๋Œ์–ด๋‚ด์•ผํ•จ. ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ