Web & Dev

Neural Network Activation Functions

Common activations — formulas, output range, and typical use.

Activations

NameFormulaRangeUse / notes
Identityf(x) = x(−∞, ∞)Regression output
Sigmoid1 / (1 + e^{−x})(0, 1)Binary classification output; saturates
Tanh(e^x − e^{−x}) / (e^x + e^{−x})(−1, 1)Zero-centered; saturates
ReLUmax(0, x)[0, ∞)Default hidden activation; cheap; can "die"
Leaky ReLUmax(0.01x, x)(−∞, ∞)Fixes dying-ReLU
PReLUmax(αx, x), α learned(−∞, ∞)Parametric leaky
ELUx if x>0 else α(e^x − 1)(−α, ∞)Smooth, zero-centered
GELUx · Φ(x) ≈ 0.5x(1 + tanh(…))(≈−0.17, ∞)Transformers (BERT, GPT)
SiLU / Swishx · sigmoid(x)(≈−0.28, ∞)EfficientNet, modern LLMs
Mishx · tanh(softplus(x))(≈−0.31, ∞)Alternative to Swish
Softplusln(1 + e^x)(0, ∞)Smooth ReLU
Softmaxe^{xᵢ} / Σ e^{xⱼ}(0, 1) summing to 1Multi-class output

Picking one

  • Hidden layers: ReLU for CNNs, GELU/SiLU for transformers.
  • Output for classification: sigmoid (binary), softmax (multi-class).
  • Output for regression: linear (no activation).
  • Dying ReLU: use Leaky ReLU, ELU, or check that the learning rate isn't too high.
  • Sigmoid/Tanh in deep nets: avoid — gradients vanish.
Was this article helpful?