Recently I came across an interesting idea about little mistakes in counting problems that actually don’t amount to much. In A Problem Squared 030, Matt Parker was investigating the question “What are the odds of having the same child twice?” and made some simplifying assumptions when thinking about DNA combinatorics. He justified leaving out a small number of things when counting an astronomical number of things by going through an example from the lottery.

The current UK lottery uses 59 balls and draws 6 of these, so the one in 45 million figure arises from \(\binom{59}{6}=45,\!057,\!474\), and the probability of winning is a tiny

\[ \frac{1}{45057474} = 0.00000002219387620 \text{.}\]

Matt posits the idea that somewhere along the way we forget to include some tickets.

But let’s say along the way while I’m working it out, for strange reasons I go ‘oh you know what, I’m going to ignore all the options which are all square numbers. You know, I just can’t be bothered including them. Yeah, they’re legitimate lottery tickets, but just to make the maths easier I’m going to ignore them’. And people are getting up in arms, and they’re like ‘you can’t ignore them, they’re real options’.

Let’s imagine that we try to work out the odds of winning based on a model of the lottery that leaves out the tickets that have all square numbers on them, and see how that compares with the true probability.

With numbers from 1 to 59, there are 7 square numbers, so there are \(\binom{7}{6}=7\) possible lottery tickets that are all squares. This reduces our \(45,\!057,\!474\) to \(45,\!057,\!467\), and

\[ \frac{1}{45057467} = 0.00000002219387965 \text{.}\]

So our inaccurate model of lottery odds comes up with basically the same probability – the two numbers are the same for the first seven significant figures. Matt’s point is that overlooking this small number of cases hasn’t changed our probability meaningfully, so won’t affect any conclusions we draw from it. We’ve modelled the lottery but left out some possible valid tickets, and it doesn’t matter.

Matt then claimed ignoring all tickets which are all odd numbers wouldn’t change the probability beyond the second significant digit.

There are 30 odd numbers from 1-59, so we’re talking about all the tickets using only these 30 numbers. There are \( \binom{30}{6}=593,\!775\) such tickets, and ignoring them changes our probability to

\[ \frac{1}{44463699} = 0.00000002249025660 \text{.}\]

Even though we’ve ignored just over half a million possible tickets, the probability is the same to the first two significant figures.

Next Matt said there was no change beyond the first significant figure if we ignore “all the lottery tickets that have a 7 on them”. I wasn’t sure whether he meant tickets that chose number 7 or tickets that chose a number with a 7 digit somewhere.

First, let’s consider the tickets that choose literally ball number 7 and then five other numbers. There are 58 balls that aren’t 7, and we want to draw 5 of them, so the number of these tickets is \(\binom{58}{5}=4,\!582,\!116\). This makes the probability

\[ \frac{1}{40475358} = 0.00000002470639049 \]

which matches to one significant figure.

The alternative is to think about tickets that have a number 7 on them anywhere. There are six numbers from 1-59 that involve a digit 7 (7, 17, 27, 37, 47 and 57). We can count all the tickets using numbers that don’t include a digit 7 using \(\binom{53}{6}\) and then work out the number that do include a 7 as \(\binom{59}{6}\) minus this (since we don’t mind multiple 7s). This is \(22,\!099,\!994\) tickets, making the probability

\[ \frac{1}{22957480} = 0.00000004355878781 \text{.} \]

At last, a meaningful difference! Well, at least, a change to the leading significant figure – don’t go betting the farm on \(4.4\times 10^{-8}\) odds, will you? I checked with Matt and yes, the first of these is the calculation he had done.

I’m amused at how what seems like a minor change from ‘ball 7’ + 5 others to ‘one of six balls’ + 5 others is a big enough change to make such a difference, when ignoring all the wholly-odd-numbered tickets didn’t make a dent in it.

Anyway, I thought this was a cute idea, bringing together modelling errors and combinatorics in a nice way.

On Twitter, Jacob Christian Munch Anderson (@NoHatCoder) helpfully pointed out I’d made a mistake in the final calculation on the page, now corrected. Thanks!