Interactive visualization of perceptron, activation functions, and neural network fundamentals
Adjust weights and bias to see how the perceptron computes its output
Without activation functions, neural networks remain linear regardless of depth
Linear Composition of Linear = Linear
Compare different activation functions and their derivatives
See how gradients propagate through different activation functions
If f'(z) ~ 0, gradient vanishes!
| Function | Large |z| Gradient | z=0 Gradient |
|---|---|---|
| Sigmoid | ≈0 (vanishing) | 0.25 |
| Tanh | ≈0 (vanishing) | 1.0 |
| ReLU | 1 (for z>0) | 0 or 1 |
| Swish | Smooth non-zero | 0.5 |
| GELU | Smooth non-zero | 0.5 |
Compare linear-only networks vs networks with nonlinear activations
A feedforward network with at least one hidden layer can approximate any continuous function on compact subsets of R^n
| Task | Activation |
|---|---|
| Binary Classification | Sigmoid |
| Multi-class Classification | Softmax |
| Regression | Linear (none) |
Learn "what to look at"
Learn "threshold"
Learn "how to respond"
Neuron = Learnable feature transformer with nonlinear gating
Perceptron is the atomic structure of neural networks
Activation functions determine whether networks can learn complex patterns
ReLU made deep learning truly trainable
GELU/Swish make large models more stable and powerful