Memory Hierarchy Reference
CPU registers to persistent storage — sizes, latencies, and trade-offs.
Reference
Latency ladder (typical 2020s CPU)
| Level | Typical size | Access latency | Throughput |
|---|---|---|---|
| Registers | ~1 KB (few hundred) | < 1 cycle | ~TB/s per core |
| L1 cache | 32–64 KB per core | ~4 cycles (~1 ns) | ~TB/s |
| L2 cache | 512 KB – 2 MB per core | ~12 cycles (~3 ns) | TB/s |
| L3 cache | ~MBs (shared) | ~40 cycles (~10 ns) | ~100s GB/s |
| DRAM | GBs | ~100 ns | 30–80 GB/s per channel |
| NVMe SSD | TBs | 10–100 µs | 3–14 GB/s |
| SATA SSD | TBs | 50–200 µs | ~550 MB/s |
| HDD | 1–30 TB | ~10 ms (seek) | ~200 MB/s |
| Tape | TBs | seconds | MB–GB/s (sequential) |
| Network (LAN) | — | 100 µs – 1 ms | 1–100 Gbit/s |
| Network (internet) | — | 10–100 ms | varies |
Intuitions
- 1 cycle on a 3 GHz CPU ≈ 0.33 ns.
- L1 hit vs DRAM: ~100× difference (1 ns vs 100 ns).
- DRAM vs NVMe: ~1000× (100 ns vs 100 µs).
- Cache-friendly code can run 10–100× faster than cache-oblivious for memory-bound work.
- Jeff Dean's "latency numbers every programmer should know" — timeless reference.
Last updated: