Web & Dev

Machine Learning Loss Functions

Regression, classification, and task-specific losses — what each measures and watch-outs.

Regression

LossFormulaNotes
MSE / L2(1/N) Σ (y − ŷ)²Smooth; penalizes outliers heavily
MAE / L1(1/N) Σ |y − ŷ|Robust to outliers; gradient constant
HuberL2 if |e| < δ else L1Smooth + robust
Log-coshΣ log(cosh(e))Smooth everywhere, outlier-resistant
QuantileΣ max(q·e, (q−1)·e)Regression for a specific quantile

Classification

LossFormulaNotes
Binary cross-entropy−[y·log(p) + (1−y)·log(1−p)]Use with sigmoid output
Categorical cross-entropy−Σ yᵢ · log(pᵢ)Use with softmax output
Sparse categorical CEIndex-label versionSame as above with integer labels
Hingemax(0, 1 − y·ŷ)SVMs; y ∈ {−1, +1}
Focal loss−(1 − p_t)^γ · log(p_t)Imbalanced classification
Label smoothingReplace hot 1 with 1−εPrevents overconfidence

Task-specific

LossUse
Triplet lossMetric learning — embeddings
Contrastive (InfoNCE)Self-supervised, CLIP-style training
Dice coefficientImage segmentation (handles imbalance)
IoU / JaccardSegmentation / detection
CTC lossSequence prediction without alignment (speech, OCR)
DPO / reward modelingRLHF — preference-based fine-tuning
KL divergenceDistillation, variational methods

Notes

  • Add a small regularization term (L1 / L2 on weights) to reduce overfitting.
  • Log-likelihoods should be computed in log-space to avoid numerical underflow — use log_softmax + NLL instead of softmax + log.
Was this article helpful?