In 2012, a neural network called AlexNet was shown an image it had never seen before — a photograph of a cat — and correctly identified it as a cat. This sounds trivial. But AlexNet was the first program to match human-level performance on a benchmark with 1,000 categories of objects. No programmer wrote a rule that said "cats have pointed ears and whiskers." The network learned what a cat looks like from 1.2 million labeled photographs. Understanding how that learning works requires understanding the mechanics of a neural network.

The Basic Unit: A Neuron

A artificial neuron takes several inputs, multiplies each by a weight, adds the weighted inputs together, adds a bias term, and passes the result through an activation function. The output is a single number. That's it — one neuron is not very powerful. But chain thousands of them together in layers, and the system can represent extraordinarily complex patterns.

Single neuron output: y = f(w₁x₁ + w₂x₂ + ... + wₙxₙ + b) x₁...xₙ = inputs (e.g., pixel brightness values) w₁...wₙ = weights (learned from data) b = bias (a learned offset) f = activation function (adds nonlinearity) In plain English: weight each input by its importance, sum everything up, then squash through a nonlinear function.

The activation function is crucial. Without it, stacking layers of neurons would be mathematically equivalent to just one layer — the whole network would still only compute linear functions. Common choices like ReLU (f(x) = max(0, x) — output zero if input is negative, otherwise pass it through unchanged) introduce the nonlinearity that lets networks learn curved decision boundaries and complex shapes.

Layers: Building Depth

A neural network organizes neurons into layers. The input layer receives raw data — for AlexNet, 224×224 pixel values. The output layer produces predictions — one probability per category. In between are hidden layers, where the network builds increasingly abstract representations. Early layers detect edges and color gradients. Middle layers combine edges into shapes. Deep layers combine shapes into object parts. The final layer combines parts into whole objects.

AlexNet had 8 layers (5 convolutional + 3 fully connected), 60 million parameters (weights and biases), and needed 6 days of training on two GPUs to learn. Today's state-of-the-art vision networks have hundreds of layers and billions of parameters. Depth is what makes "deep learning" deep.

Learning: Backpropagation

The network starts with random weights. It sees a labeled example — a cat photo with the label "cat" — and produces an output. The output is almost certainly wrong initially. The loss function measures how wrong: the difference between the predicted probability for "cat" and 1.0 (the correct answer). Backpropagation computes how much each weight contributed to the error, by working backward through the network using the chain rule of calculus. Then gradient descent adjusts every weight slightly in the direction that reduces the error.

One training step: 1. Forward pass: compute output given current weights 2. Compute loss: how wrong is the prediction? 3. Backward pass (backpropagation): compute ∂loss/∂weight for every weight 4. Update: weight ← weight - learning_rate × ∂loss/∂weight Repeat for millions of examples → weights converge to useful values

After a million training examples, each with this four-step adjustment, the weights encode the visual features that distinguish cats from dogs from cars from chairs — not because anyone programmed those features, but because they minimize prediction error across the training set.

Other Applications

The same architecture — inputs, weighted connections, hidden layers, outputs, trained by backpropagation — is adapted for text (language models), audio (speech recognition), protein structure prediction (AlphaFold), and drug discovery. The specific architecture changes for each domain, but the core learning mechanism is universal. Neural networks don't know in advance what they're learning — they discover useful representations from data.

Conclusion

A neural network is a layered system of simple computation units whose connections are tuned by exposure to data. Each neuron computes a weighted sum followed by a nonlinear squash. Layers build increasingly abstract representations. Backpropagation computes exactly how each weight should change to reduce prediction error. Repeat across millions of examples and the network learns — not from explicit rules, but from patterns in data. This mechanism, refined over decades, is what made AlexNet recognize cats and what makes modern AI systems as capable as they are.