Machine Learning Loss Functions

Regression, classification, and task-specific losses — what each measures and watch-outs.

Reference Reference Updated Apr 19, 2026

Loss	Formula	Notes
MSE / L2	(1/N) Σ (y − ŷ)²	Smooth; penalizes outliers heavily
MAE / L1	(1/N) Σ \|y − ŷ\|	Robust to outliers; gradient constant
Huber	L2 if \|e\| < δ else L1	Smooth + robust
Log-cosh	Σ log(cosh(e))	Smooth everywhere, outlier-resistant
Quantile	Σ max(q·e, (q−1)·e)	Regression for a specific quantile

Loss	Formula	Notes
Binary cross-entropy	−[y·log(p) + (1−y)·log(1−p)]	Use with sigmoid output
Categorical cross-entropy	−Σ yᵢ · log(pᵢ)	Use with softmax output
Sparse categorical CE	Index-label version	Same as above with integer labels
Hinge	max(0, 1 − y·ŷ)	SVMs; y ∈ {−1, +1}
Focal loss	−(1 − p_t)^γ · log(p_t)	Imbalanced classification
Label smoothing	Replace hot 1 with 1−ε	Prevents overconfidence

Loss	Use
Triplet loss	Metric learning — embeddings
Contrastive (InfoNCE)	Self-supervised, CLIP-style training
Dice coefficient	Image segmentation (handles imbalance)
IoU / Jaccard	Segmentation / detection
CTC loss	Sequence prediction without alignment (speech, OCR)
DPO / reward modeling	RLHF — preference-based fine-tuning
KL divergence	Distillation, variational methods

Notes

Add a small regularization term (L1 / L2 on weights) to reduce overfitting.
Log-likelihoods should be computed in log-space to avoid numerical underflow — use log_softmax + NLL instead of softmax + log.

Last updated: April 19, 2026