CPU vs GPU Comparison

How CPUs and GPUs differ — cores, parallelism, memory, and when to use each.

Reference Reference Updated Apr 19, 2026
Reference

Architecture

Aspect CPU GPU
Cores 4–128 complex cores 1 000s of small ALUs (SIMT)
Clock 3–6 GHz 1–2.5 GHz
Pipelines Deep, out-of-order Simpler, in-order per lane
Branch prediction Sophisticated Minimal — divergence is costly
Cache per core 32 KB L1 + MBs L2/L3 ~KB shared memory + L1
Memory b/w 50–500 GB/s 500–3 000 GB/s (HBM)
Latency Low (~1 ns L1) Hidden by massive parallelism
Thread model Few, heavy threads Warps/waves of 32–64 threads

When each wins

Workload Better
Serial / branchy code CPU
OS, databases, compilers CPU
Matrix multiplies / neural nets GPU
Graphics, ray tracing GPU
Scientific sims (lattice methods) GPU
Small payloads, low latency CPU
Large payloads, throughput GPU

Specialized accelerators

TPU (Google)
Tensor Processing Unit — dense matmul, training/inference
Trainium / Inferentia (AWS)
Cloud ML accelerators
NPU
Neural Processing Unit — on-device ML (Apple, Qualcomm)
FPGA
Reconfigurable hardware for custom pipelines
DPU
Data Processing Unit — offload networking/storage

Last updated: