# George E. P. Box (1919-2013)

March was a terribly sad month for the University of Wisconsin; just days after losing Mary Ellen Rudin, George E. P. Box also passed away. He was 93.

Among laypeople, Box is best known for his aphorism that “All models are wrong, but some are useful”, a quote that would probably work well as a podcast title. However, Box’s contributions to maths went further than just a brilliant witticism.

Born in Kent in 1919, his studies in Chemistry were cut short by the second world war, during which he taught himself statistics; after the war, he changed his degree to Mathematics and Statistics at UCL and studied under Egon Pearson to complete his PhD in 1953. After a spell working for ICI, followed by spells as a peripatetic academic, Box moved to the University of Wisconsin in 1960 to create the Department of Statistics and remained there until he ‘retired’ in 1992. I’m not sure academics ever truly retire; Box remained active in research until very recently.

His first marriage was to Jessie Ward, his second to the author Joan Fisher, then in 1985 he married Claire Quist. Curiously, both his supervisor and second wife were the offspring of great statisticians – Egon was the son of Karl Pearson, and Joan the daughter of Ronald Fisher.

One of the benefits – or possibly perils – of being self-taught is that you end up with a habit of discovering things for yourself. In Box’s case, he became one of the pioneers of modern statistics.

In 1951, Box and K.B. Wilson developed an approach to experimental design known as response surface methodology. RSM uses a two-step process of determining which variables are significant, before fitting a second-order polynomial to them. This makes it possible (and, more to the point, affordable) to distinguish between processes stuck at saddle points and processes that had reached true extrema. The Box-Behnken design is an example of this: each control variable can take one of three values (low, central or high) and, rather than test each possible combination, you pick enough combinations relatively ‘close’ to the centre to fit a quadratic model. With seven factors, for instance, you would perform around 60 experiments rather than the $3^7=2,187$ I would naively have tried.

With Mervin Muller, Box developed an ingenious method (the Box-Muller transform) to generate independent, normally-distributed random variables in two dimensions. Given $X$ and $Y$, both drawn from uniform distributions on $(0,1]$, the variables $Z_0 = \sqrt{-2 \ln X} \cos(2\pi Y)$ and $Z_1 = \sqrt{-2 \ln X} \sin(2\pi Y)$ are independent and normally distributed.

Had Dr Seuss been a statistician, he might have come up with the Box-Cox transformation, a method for taming data with non-constant variance using a power law (and, probably, foxes in socks). The transformation finds the most likely power law, given the data, and – if you’re lucky – gives you a truncated normal distribution in the transformed data.

Box was also a field leader in time-series analysis, developing the Box-Jenkins methodology with and the Ljung-Box test with collaborators Gwilym Jenkins and Greta Ljung, although I can’t tell which worked on which. The Box-Jenkins model estimates parameters for autoregression (to find periodic variations) and a moving average term (to find how the data varies over time) for suitable time-series data; the Ljung-Box test determines whether a data series is significantly autocorrelated.

Box also wrote, with William and Stu Hunter, Statistics for Experimenters, possibly the book in the field – yours for just £91!.

As a final disclaimer, I should note that I am not a statistician. This means that, in all likelihood, these summaries are wrong, but hopefully useful.

George E. P. Box at Wikipedia

George E. P. Box at MacTutor

George Box at the American Society for Quality

### Correction

This post was updated on April 9th, 2013 because the previous version had omitted Jessie Ward. Thanks to John Hunter for pointing out the error.