Theme

Custom Colors

Accent

#d4943a

Background

#0c0e11

Header

#141719

Cards

#1a1d21

Accessibility Presets

Font

Code and tool outputs stay monospace.

Readability

Font Size

16px

Line Height

1.6

Letter Spacing

0px

Numbers & Math

Floating-Point Formats

IEEE 754 and ML-specific float formats — bit layout, range, and precision.

Updated Apr 19, 2026 2 min read

Formats

Format	Total bits	Sign	Exponent	Mantissa	Exponent bias	Max	Min normal	Decimal digits
FP64 (double)	64	1	11	52	1023	~1.8e308	~2.2e-308	~15.9
FP32 (float)	32	1	8	23	127	~3.4e38	~1.2e-38	~7.2
FP16 (half)	16	1	5	10	15	~65504	~6.1e-5	~3.3
BF16 (brain float)	16	1	8	7	127	~3.4e38	~1.2e-38	~2.4
FP8 E4M3	8	1	4	3	7	448	~1.95e-3	~1
FP8 E5M2	8	1	5	2	15	~57344	~6.1e-5	~0.8

Special values (IEEE 754)

±0	sign bit set/clear, exponent = 0, mantissa = 0
±∞	exponent all 1s, mantissa = 0
NaN	exponent all 1s, mantissa ≠ 0 (quiet/signaling variants)
Subnormals	exponent = 0, mantissa ≠ 0 — gradual underflow

Notes

FP16 vs BF16: same 16 bits; BF16 trades precision (7 mantissa bits) for FP32-matching exponent range — preferred for ML training.
FP8 formats are used for quantized training/inference; E5M2 has more range, E4M3 more precision.
Integer equivalence: 32-bit int exactly representable up to 2²⁴ in FP32, 2⁵³ in FP64.
0.1 + 0.2 ≠ 0.3: binary float can't represent decimal fractions exactly.

Was this article helpful?