Neural Network Activation Functions
Common activations — formulas, output range, and typical use.
Reference
Activations
| Name | Formula | Range | Use / notes |
|---|---|---|---|
| Identity | f(x) = x | (−∞, ∞) | Regression output |
| Sigmoid | 1 / (1 + e^{−x}) | (0, 1) | Binary classification output; saturates |
| Tanh | (e^x − e^{−x}) / (e^x + e^{−x}) | (−1, 1) | Zero-centered; saturates |
| ReLU | max(0, x) | [0, ∞) | Default hidden activation; cheap; can "die" |
| Leaky ReLU | max(0.01x, x) | (−∞, ∞) | Fixes dying-ReLU |
| PReLU | max(αx, x), α learned | (−∞, ∞) | Parametric leaky |
| ELU | x if x>0 else α(e^x − 1) | (−α, ∞) | Smooth, zero-centered |
| GELU | x · Φ(x) ≈ 0.5x(1 + tanh(…)) | (≈−0.17, ∞) | Transformers (BERT, GPT) |
| SiLU / Swish | x · sigmoid(x) | (≈−0.28, ∞) | EfficientNet, modern LLMs |
| Mish | x · tanh(softplus(x)) | (≈−0.31, ∞) | Alternative to Swish |
| Softplus | ln(1 + e^x) | (0, ∞) | Smooth ReLU |
| Softmax | e^{xᵢ} / Σ e^{xⱼ} | (0, 1) summing to 1 | Multi-class output |
Picking one
- Hidden layers: ReLU for CNNs, GELU/SiLU for transformers.
- Output for classification: sigmoid (binary), softmax (multi-class).
- Output for regression: linear (no activation).
- Dying ReLU: use Leaky ReLU, ELU, or check that the learning rate isn't too high.
- Sigmoid/Tanh in deep nets: avoid — gradients vanish.
Last updated: