Computer Architecture Diagram

Major blocks

Block	Role
Core	Executes instructions — fetch, decode, execute, retire
L1 instruction / data cache	Small, per-core, 1–2 cycle access
L2 cache	Larger per-core or shared pair, ~10 cycles
L3 / last-level cache (LLC)	Shared across cores on socket, tens of cycles
Memory controller	Interfaces to DDR DRAM (2–4 channels)
PCIe root complex	Connects to GPU, NVMe, network
Chipset / IO hub	USB, SATA, slower IO
Coherence fabric	Ring, mesh, or point-to-point between cores
Power / clock management	DVFS, C-states, P-states

Fetch	Get instructions from I-cache; branch prediction
Decode	Convert to micro-ops (µops)
Rename	Map architectural → physical registers
Dispatch	Issue to reservation stations
Execute	Integer / FP / load-store units
Writeback	Store result in physical register
Retire	Commit to architectural state (in program order)

Superscalar: multiple instructions issue per cycle.
Out-of-order: executes ready ops first; retires in order.
SMT / hyperthreading: 2 threads share one core's front end.
SIMD (AVX, NEON): operate on many data elements per instruction.
Branch prediction: modern predictors reach >95% accuracy.
Speculative execution: guess branches; rollback if wrong (Spectre/Meltdown mitigations).
NUMA: multiple sockets with local memory — pin workloads to local CPU.
Chiplets / tiles: AMD / Intel split cores and IO across dies.

Was this article helpful?