DRAM generations
| Type | Data rate | Peak bandwidth (single channel) |
|---|---|---|
| DDR3-1600 | 1600 MT/s | 12.8 GB/s |
| DDR4-2400 | 2400 MT/s | 19.2 GB/s |
| DDR4-3200 | 3200 MT/s | 25.6 GB/s |
| DDR5-4800 | 4800 MT/s | 38.4 GB/s |
| DDR5-6400 | 6400 MT/s | 51.2 GB/s |
| LPDDR5 | 6400 MT/s | mobile / soldered |
| HBM2e | 3.6 Gbps/pin | ~460 GB/s stack |
| HBM3 | 6.4 Gbps/pin | ~800 GB/s stack |
Virtual memory
| Page size | 4 KB default; huge pages 2 MB or 1 GB |
|---|---|
| Page table | Maps virtual → physical — multi-level (typically 4 on x86-64) |
| TLB | Caches recent translations |
| TLB miss | Costs ~100 ns on x86; use huge pages for large working sets |
| Swap / paging | Move cold pages to disk — modern systems avoid swap when possible |
Cache behavior
| Cache line | 64 bytes on x86 / ARM |
|---|---|
| Associativity | 8–16-way typical |
| Coherence | MESI / MOESI between cores |
| False sharing | Different cores hitting same cache line — pad to 64 B |
| Write-back | Dirty lines flushed to next level on eviction |
Tips
- Design data for locality — contiguous arrays beat linked lists for iteration.
- Align hot data on cache-line boundaries.
- Prefetch: modern CPUs detect sequential access automatically; manual prefetch hints help for irregular patterns.
- Avoid false sharing between threads by padding or separating per-thread data.
Was this article helpful?