A spam filter receives an email with the words "congratulations," "winner," "click here," and "free prize." Each word individually is suspicious. But how do they combine? Does seeing all four words make it 4 times as likely to be spam as seeing just one? Or more? And what if the sender is your grandmother — does that override the suspicious words? Bayesian networks are the mathematical tool for answering these questions: reasoning through chains of probabilistic cause and effect.
The Problem with Simple Probability
Naive spam filters multiply the probabilities of individual suspicious words, treating each as independent. But words in emails aren't independent — a message about "free prizes" is also likely to contain "congratulations," so these words are correlated. Multiplying their probabilities as if they're independent double-counts evidence and produces badly calibrated estimates. Bayesian networks fix this by explicitly modeling which variables cause which others.
The Structure: A Causal Graph
A Bayesian network is a directed graph where each node is a variable and each arrow represents a direct causal or probabilistic influence. For the spam example: "Email is Spam" causes "Contains 'free'" and "Contains 'congratulations'" and "Contains 'click here'" — but those three word variables don't cause each other. They're all caused by the same root (spam status). In graph terms, they're conditionally independent given the spam status: once you know whether it's spam, knowing about one word tells you nothing extra about the others.
The network encodes the insight that W1, W2, and W3 are independent once you know S — so instead of needing a probability table with 2⁴ = 16 entries, you need just P(S), P(W1|S), P(W2|S), P(W3|S) — 5 numbers. This compression grows dramatically with more variables.
Bayesian Inference: Updating Beliefs
The power of a Bayesian network is inference: given observed evidence, compute the probability of unobserved variables. For spam detection: you observe which words appear in the email. Using Bayes' theorem, the network computes the posterior probability of spam given those observations. Each word that's more common in spam than legitimate email increases the spam probability; each word more common in legitimate email decreases it.
If you add sender information — "sender is in my contacts" — the network incorporates this as another node: "Known Sender" reduces P(Spam) substantially, even if the words are suspicious. The network balances all evidence simultaneously.
Medical Diagnosis
Bayesian networks were famously applied to medical diagnosis in the early 1990s. A network for diagnosing chest pain has nodes for possible diseases (heart attack, pulmonary embolism, anxiety), symptoms (chest pain, shortness of breath, sweating), and test results (ECG, troponin levels). Doctors enter symptoms and test results as observed evidence; the network computes the probability of each diagnosis. The QMR-DT system, one of the first, encoded knowledge from thousands of medical cases into a Bayesian network and achieved diagnostic accuracy comparable to specialist physicians.
Other Applications
Bayesian networks model gene regulatory interactions in biology, fault diagnosis in complex machinery (given these sensor readings, what component has most likely failed?), weather forecasting (given today's pressure and humidity, what's tomorrow's rain probability?), and financial risk assessment. Anywhere multiple uncertain variables influence each other through known causal pathways, a Bayesian network provides a principled probabilistic model.
Conclusion
Bayesian networks represent probabilistic cause-and-effect relationships in a graph, then use Bayes' theorem to reason from observed evidence to unobserved causes. The graph structure encodes independence assumptions — once you know the cause, the effects don't give you extra information about each other — compressing a complex joint probability into manageable pieces. For spam filtering, medical diagnosis, and machine fault detection, this framework provides a mathematically rigorous way to update beliefs as evidence accumulates.