Major blocks
| Block | Role |
|---|---|
| Core | Executes instructions — fetch, decode, execute, retire |
| L1 instruction / data cache | Small, per-core, 1–2 cycle access |
| L2 cache | Larger per-core or shared pair, ~10 cycles |
| L3 / last-level cache (LLC) | Shared across cores on socket, tens of cycles |
| Memory controller | Interfaces to DDR DRAM (2–4 channels) |
| PCIe root complex | Connects to GPU, NVMe, network |
| Chipset / IO hub | USB, SATA, slower IO |
| Coherence fabric | Ring, mesh, or point-to-point between cores |
| Power / clock management | DVFS, C-states, P-states |
Pipeline stages (simplified)
| Fetch | Get instructions from I-cache; branch prediction |
|---|---|
| Decode | Convert to micro-ops (µops) |
| Rename | Map architectural → physical registers |
| Dispatch | Issue to reservation stations |
| Execute | Integer / FP / load-store units |
| Writeback | Store result in physical register |
| Retire | Commit to architectural state (in program order) |
Modern CPU features
- Superscalar: multiple instructions issue per cycle.
- Out-of-order: executes ready ops first; retires in order.
- SMT / hyperthreading: 2 threads share one core's front end.
- SIMD (AVX, NEON): operate on many data elements per instruction.
- Branch prediction: modern predictors reach >95% accuracy.
- Speculative execution: guess branches; rollback if wrong (Spectre/Meltdown mitigations).
- NUMA: multiple sockets with local memory — pin workloads to local CPU.
- Chiplets / tiles: AMD / Intel split cores and IO across dies.
Was this article helpful?