CPU vs GPU Comparison
How CPUs and GPUs differ — cores, parallelism, memory, and when to use each.
Reference
Architecture
| Aspect | CPU | GPU |
|---|---|---|
| Cores | 4–128 complex cores | 1 000s of small ALUs (SIMT) |
| Clock | 3–6 GHz | 1–2.5 GHz |
| Pipelines | Deep, out-of-order | Simpler, in-order per lane |
| Branch prediction | Sophisticated | Minimal — divergence is costly |
| Cache per core | 32 KB L1 + MBs L2/L3 | ~KB shared memory + L1 |
| Memory b/w | 50–500 GB/s | 500–3 000 GB/s (HBM) |
| Latency | Low (~1 ns L1) | Hidden by massive parallelism |
| Thread model | Few, heavy threads | Warps/waves of 32–64 threads |
When each wins
| Workload | Better |
|---|---|
| Serial / branchy code | CPU |
| OS, databases, compilers | CPU |
| Matrix multiplies / neural nets | GPU |
| Graphics, ray tracing | GPU |
| Scientific sims (lattice methods) | GPU |
| Small payloads, low latency | CPU |
| Large payloads, throughput | GPU |
Specialized accelerators
- TPU (Google)
- Tensor Processing Unit — dense matmul, training/inference
- Trainium / Inferentia (AWS)
- Cloud ML accelerators
- NPU
- Neural Processing Unit — on-device ML (Apple, Qualcomm)
- FPGA
- Reconfigurable hardware for custom pipelines
- DPU
- Data Processing Unit — offload networking/storage
Last updated: