Processors except the CPU
GPU
SPMD model implemented by a SIMD processor
SIMD vs SIMT
- SIMD: a single sequential instruction stream of SIMD instruction
- SIMT: Multiple instruction streams of scalar instructions
Grouping thread into warp --> advantage of SIMD
Independent execution of threads --> advantage of SPMD
VLIW
- simple hardware (no dynamic scheduling, dependency checking)
- compilation (complex and recompilation is needed for various thing, width, latency changes)
- lockstep execution causes independent operations to stall
DAE (Decoupled Access / Execute)
Motivation: Tomasulo's algorithm is too complex
Decouple Access (memory) / Execute (computing), communicate with queue
- Queues reduce registers
- OoO without wakeup/select complexity
- Branch synchronization between A and E
- multiple instruction streams?