The Book of Why: The New Science of Cause and Effect
“The Book of Why offers a popular science history and explanation of the mathematics of causation. . . . written in a simple and entertaining style . . .”
The Book of Why is a popular science treatment of the mathematics of cause and effect. The mathematics of causation has broad applicability, for example, evaluating the effectiveness of medicine or law, and, so the authors claim in predicting the future in general.
Causal inference is the scientific search for root causes expressed through mathematics. The Book of Why began as an academic paper by Judea Pearl that grew into a collaboration with Dana Mackenzie. Pearl explains that the concept that led to writing The Book of Why took a long time because of his own lack of understanding of the gap between the language of cause and effect that is used by scientists, and the language of cause and effect used by the rest of us.
The problem, according to Pearl and Mackenzie, is that probability and statistics can show correlation but are causality-free. For an example, that a rooster crows before sunrise can be shown by the mathematics of correlation. However, that the rooster’s crow causes the sun to rise would be false causation. Because the vocabulary was missing from probability and statistics, it was difficult for scientists to reconcile cause and effect as normally understood with cause and effect as could be expressed by mathematics.
Until The Book of Why, there was no mathematical notation to express cause and effect. Mathematically we can say X and Y are “related” or “associated,” but until the invention of the “do” calculus by Pearl, there has been no notation to express X “is the cause of” Y. And without having a notation to express causation, the exploration of cause and effect by scientists has been hampered.
Pearl’s breakthrough was to replace the (vague) relationship of correlation with an action and a probability of outcome. The action is called a “do,” as in, if you do X, what is the likelihood of Y? Or, if Y happened, what is the likelihood that doing X caused Y? The most interesting case, according to Pearl and Mackenzie would be, If X happened before Y and if I went back in time to prevent X, would Y still occur?
The Book of Why is made understandable to the general audience by use of simple examples, causal or path diagrams and explanations. Pearl and Mackenzie provide the more abstract symbolic logic notation for those who want to “dig in” in greater detail. The path diagram consists of line segments and arrows where the start of the arrow is one or more causes, while the pointy-end is the effect, and the line from start to pointy-end correspond to a “what if” scenario. The more rigorous part is assigning likelihoods to each path, in which the likelihoods that correspond to probabilities, and the symbolic notation permits expressing “what if” thoughts in mathematical form.
The advantage of having symbolic logic enables what Pearl and Mackenzie call “the calculus of causation” that permits an exploration of the “algorithm of counterfactuals.” Counterfactuals are events that did not happen but could have but for something else that intervened. Counterfactuals can be used as a gateway to explore of how the future might change by intervention in the present.
Pearl and Mackenzie detail the “do” calculus and provide examples and analyses. Thankfully they define their terms before use, for example to “do X” is to perform an intervention. A “do” followed by an observation of what happens next is an experiment.
However, Pearl and Mackenzie sometimes are slippery in their explanations; they do not define “causation” apart from a probability, that is, causation to Pearl and Mackenzie is not an assertion but a question, “If you Do X, what is the likelihood of Y?”
Along the same lines of slipperiness, the limits of what “do” calculus can do are not identified until two-thirds of the way in. The delay in explanation is a severe disservice to the reader, when the limits are finally explained the “do” calculus becomes a solution verifier more than a searcher for solutions.
As part of the narrative, Pearl and Mackenzie offer a history of breakthroughs in the field of statistics and probability related to the science of causation, and mathematicians mentioned (some briefly, some in-depth) include: Francis Galton, Karl Pearson, Sewall Wright, Sir Harold Jeffreys, R. A. Fisher, Claude Berou, and Harold Jeffreys. Here, Pearl and Mackenzie point out the accidents of history that led to the detection but then avoidance of cause from the mathematics of statistics and probability.
The first mathematician of note is Francis Galton. Galton created the concept of correlation in statistics but did not take the idea further. Pearl and Mackenzie write, “It is an irony of history that Galton started out in search of causation and ended up discovering correlation, a relationship that is oblivious of causation.”
Galton’s discoveries were followed by those of his protégé, Karl Pearson. Pearson established the field of mathematical statistics, and Pearl and Mackenzie note that while Galton removed causation from statistics, Pearson took the step of removing causation from science. When Pearson discovered a contradiction that could only be explained by absence of causation, Pearson disregarded the contradiction, calling it a “spurious correlation.” Pearl and Mackenzie conclude this section with an emphatic, “What a missed opportunity!”
The next statistician, Sewall Wright, was a geneticist who added to Darwin’s theory by writing a seminal paper in 1920 on genetics, which built a bridge between causality and probability. Pearl and Mackenzie claim Wright’s paper offers “the first proof that the mantra ‘Correlation does not imply causation’ should give way to ‘Some correlations do imply causation.’” Wright was also the first to use path diagrams for causation. But despite Wright’s ground-breaking discoveries, his work was neglected.
Wright’s path diagrams were not rediscovered until 1953 by Herbert Simon. Simon was a scientist who did pioneering research on complex systems and artificial intelligence. Wright’s path diagrams were rediscovered again in the 1960s by economists and sociologists, though his path diagrams were reworked toward different ends. The major problem with Wright’s rediscovery, according to Pearl and Mackenzie, was that path diagrams are not enough by themselves; the diagrams need supporting mathematics and the mathematics cannot be “canned” but needs to be reworked for every new case.
Sir Harold Jeffreys is the next mathematician of note; Jeffreys provided the foundation for the Bayesian approach to probability and statistics. Thomas Bayes was a reverend and mathematician in the 18th century. The value of the Bayesian approach is that it allows for new information to be combined with old, to improve odds-making when circumstances change. Bayesian statistics can be restated as a question, “How much evidence would it take to convince us that something we consider improbable actually happened?”
When Bayes asked this question in 1750, the context was theological; Bayes asked it in the consideration of miracles. However, Bayes’ question was not appropriate for its time and Bayes did not follow it through. Denying miracles in the 1700s could lead to fines and imprisonment, as had happened to theologian Thomas Woolston in 1729. The Bayesian approach was not fully accepted until the 1970s and is currently used in error detection and correction in data transmission, in Google page ranking, and in epidemiology.
Bayesian statistics grew into favor through the science of epidemiology. For example, given two populations, what happens when vaccines are given versus what happens when vaccines are not given. Pearl and Mackenzie address methods of solving causal problems of this type by using the mathematics of probability.
There are two directions to attack these causal problems. The first, forward probability deduces effect from cause (What may happen if I do this?). The second, reverse probability or inverse methods, is used in determining cause from effect (This happened, what might have caused it?). Calculating forward probability, Pearl writes, is “robust,” that is, there is a straight-forward calculation to answer questions of the type, If I have a disease, what is the probability that a test for the disease will detect the disease?
In solving inverse problems, there can be more than one contributing cause. For this example, if I have a disease, what caused the disease? Multiple causes are called “confounding variables.” And if there were more than one contributing cause, there would need to be a probability assigned to each confounding variable.
Pearl and Mackenzie explain that there are methods that can be used toward simplifying inverse problems; in particular, one can prune the confounding variables to consider. That subjectivity is involved in choosing what to prune is controversial (one might accidentally cherry-pick causes to reach false solutions or remove what one is searching for and remove the cause altogether). Evaluating, pruning and recalculating in effect is Bayesian statistics.
Pearl and Mackenzie address the benefits and drawbacks of using Bayesian statistics. The advantage is the methods are explanatory—you can follow every step, understand how it works, and take advantage of new information. The disadvantage is new information needs to be vetted before being used because whether by misuse or by poor judgement, solutions can be driven away from the truth. For example, consider the games played by vendors to increase Google’s page ranking system.
The solution of how to remove confounding variables (and gaming) from vaccine trials led to the invention of the randomized controlled trial, or RCT. The intent of the RCT is to avoid confounding, that is, to avoid undiscovered dependent variables, variables that have an unexpected causal relationship.
Pearl and Mackenzie next examine non-experimental studies. Non-experimental studies are studies when you don’t know what you are looking for, or have trouble separating what you meant to do versus what you actually did, or where there are ethical issues that prevent the use of RCTs. For non-experimental studies, confounding variables can’t be removed by statistics alone.
However, by using certain methods confounding variable bias can be removed. Removing bias from non-experimental studies allows the researcher to quantify uncertainty. The value in quantifying uncertainty according to Pearl and Mackenzie is, “an uncertain answer to the right question is much better than a highly certain answer to the wrong question.” For Pearl and Mackenzie, the effort to remove bias from studies was the step before the question, “Does smoking cause cancer?” could, finally, be answered.
Pearl and Mackenzie address paradoxes that can come from cause analysis, and from this, identify the limits of applying cause analysis. The accuracy of conclusions is heavily dependent on choices and assumptions going in, and this is true whether or not the “do” calculus is used in the analysis. Given these limitations It would have been interesting to see Pearl and Mackenzie apply causal analysis, or study the causal analysis done to date on the science of climate change.
To sum up, The Book of Why offers a popular science history and explanation of the mathematics of causation. Though written in a simple and entertaining style, there are occasions where their explanations remain difficult to follow. The “do” calculus as a method for causal analysis appears to have most value where there is statistical data to lean on, and though the book claims “do” calculus will solve the problem of hard-AI, little of substance on using the “do” calculus to solve AI type problems is presented.
The Book of Why has end notes and an index.