Machine Learning Loss Functions

Regression, classification, and task-specific losses — what each measures and watch-outs.

Reference Reference Updated Apr 19, 2026
Reference

Regression

Loss Formula Notes
MSE / L2 (1/N) Σ (y − ŷ)² Smooth; penalizes outliers heavily
MAE / L1 (1/N) Σ |y − ŷ| Robust to outliers; gradient constant
Huber L2 if |e| < δ else L1 Smooth + robust
Log-cosh Σ log(cosh(e)) Smooth everywhere, outlier-resistant
Quantile Σ max(q·e, (q−1)·e) Regression for a specific quantile

Classification

Loss Formula Notes
Binary cross-entropy −[y·log(p) + (1−y)·log(1−p)] Use with sigmoid output
Categorical cross-entropy −Σ yᵢ · log(pᵢ) Use with softmax output
Sparse categorical CE Index-label version Same as above with integer labels
Hinge max(0, 1 − y·ŷ) SVMs; y ∈ {−1, +1}
Focal loss −(1 − p_t)^γ · log(p_t) Imbalanced classification
Label smoothing Replace hot 1 with 1−ε Prevents overconfidence

Task-specific

Loss Use
Triplet loss Metric learning — embeddings
Contrastive (InfoNCE) Self-supervised, CLIP-style training
Dice coefficient Image segmentation (handles imbalance)
IoU / Jaccard Segmentation / detection
CTC loss Sequence prediction without alignment (speech, OCR)
DPO / reward modeling RLHF — preference-based fine-tuning
KL divergence Distillation, variational methods

Notes

  • Add a small regularization term (L1 / L2 on weights) to reduce overfitting.
  • Log-likelihoods should be computed in log-space to avoid numerical underflow — use log_softmax + NLL instead of softmax + log.

Last updated: