This is the penultimate match before we find out who is the World’s Most Interesting Mathematician (2019 edition, of the 16 people who were asked to take part and were available in July).
For the second semi-final, from group 3 it’s Sophie Carr up against the winner of group 4, Becky Warren. The pitches are below, and at the end of this post there’s a poll where you can vote for your favourite bit of maths.
Take a look at both pitches, vote for the bit of maths that made you do the loudest “Aha!”, and if you know any more cool facts about either of the topics presented here, please write a comment below!
Sophie Carr – Simpson’s paradox
Sophie Carr is the founder of Bays Consulting. Having grown up building Lego spaceships she studied aeronautical engineering before discovering Bayesian Belief Networks which has led to a career she loves – essentially make a living out of finding patterns. She prefers fresh coffee over tea, pears over apples and her favourite flower is the tulip. She’s @SophieBays on Twitter.
I genuinely did not expect to be writing a fourth pitch, so having to choose another “oooh” topic posed a conundrum. After much pondering, I decided to stay with the theme of counter-intuitive maths and a topic that can be seen in: wages, taxes, medicine, exam grades, average batting score (cricket, baseball, etc) and so the list goes on. Specifically, what I want to talk about is how a trend that’s seen in different groups of data can, when grouped, either disappear or reverse. This is known as Simpson’s (and sometimes the Simpson-Yule) paradox.
How can data trends reverse? Surely a trend or a pattern is just that? I’ve always found maths easier to understand in context, so this pitch is about a CEO who wants to reward their best sales team (Pure (∩) or Applied (Δ)) with ice lollies and an extended lunch break to eat them in. In this company, the contracts that the sales teams bid for are categorised as either “off-the-shelf” (these are contracts which the company often bids for and there are templates in the bidding procedure the sales team can call upon) or “difficult” (these are contracts which need more of a bespoke response). The outcome of any bid is binary, the team either wins or loses the contract.
This creates the first question: how should the CEO define the “best” team? As it’s hot, the CEO turns to their management dashboard and works out the success rate for each team for each type of contract, which is shown in Table 1:
Sales Team | Contract Category | Win | Lose | Total | Success rate |
---|---|---|---|---|---|
Pure: ∩ | Off the shelf | 27 | 3 | 30 | 0.9 |
Difficult | 28 | 42 | 70 | 0.4 | |
Applied: Δ | Off the shelf | 56 | 14 | 70 | 0.8 |
Difficult | 6 | 24 | 30 | 0.2 |
Simple! Thinks the CEO: the Pure team have a better success rate in both won both more “off-the-shelf” and “difficult” types of contracts. They’re the best team and deserve the extra bonus. However, knowing that giving out ice lollies to some and not others in the workforce can create tension, the CEO decides just to do a quick double check that the Pure team really are “the best” . Being a busy CEO they decide the check will be who won the most number of contracts overall by making a cross classification table. This is a table which uses all the information in Table 1, but just regroups it in a different way. I’ve used colour coding and shown the sums to help you see how the data in Table 1 has been used to see which team won the most contracts (Table 2)
Contract: | |||
---|---|---|---|
Sales Team | Win | Lose | Total |
Pure: ∩ | 27+28=55 | 3+42=45 | 100 |
Applied: Δ | 56+6=62 | 14+24=38 | 100 |
Looking at this information creates doubt in the mind of the CEO: the Pure team has the highest success rate but the Applied team have won more overall. Now who should be given the ice lollies?
Not wanting to get this wrong, and seeing as each sales team has now been “the best” the CEO decided that an ever-so-fair tie breaker is required. But what should this be? The CEO knows that there is a difference in the amount of effort required to bid for an off-the-shelf or difficult contract. Unsure of how many the company has bid for, they investigate and create another cross classification table:
Contract: | |||
---|---|---|---|
Contract Category | Win | Lose | Total |
Off the shelf | 27+56=83 | 3+14=17 | 100 |
Difficult | 28+6=34 | 42+24=64 | 100 |
Overall, the same number of difficult and off-the-shelf contracts have been bid for with more “off-the-shelf” contracts having been won. In fact the win rate for the off-the-shelf contracts is really quite high at $\frac{83}{100}$ and gives the CEO some satisfaction that this portion of the bidding pipeline seems to be working well. In fact, they decide it is the team that has contributed most to these 83 wins that will get the ice lollies. How can they work that out?
What is needed is one last cross-classification table which the CEO creates (whilst admitting they now really want an ice lolly themselves)
Contract Category | |||
---|---|---|---|
Sales Team | Off the shelf | Difficult | Total |
Pure: ∩ | 27+3=30 | 28+42=70 | 100 |
Applied: Δ | 56+14=70 | 6+24=30 | 100 |
Brilliant! New insights and the decision has become even more complex. The Applied team definitely contributed most to the 83 off-the-shelf wins. However…the pure team clearly focused their efforts on bidding for the difficult contracts. Those that take a more bespoke response.
Perhaps, decides the CEO (putting the kettle on) it’s time to bring some statistics to the problem. After all, that approach is not named in the sales teams and they’ve often found that statistics can bring a beautiful clarity to the problem. Secretly being a fan of conditional probability, the CEO decides to go back to basics and answer a fundamental question: What is the probability that a contract will be won, given that the pure or applied team has bid for it.
Grabbing a pen and paper, the CEO calculates that the probability a contract will be won, given that the Pure team has bid for it is:
\[ P(\text{Contract} = \text{win} | \text{Team} = \text{Pure}) = \left( \frac{27}{30} \times \frac{30}{100} \right) + \left( \frac{28}{70} \times \frac{70}{100} \right) = 0.32 \]
Then they calculate that the probability a contract will be won, given that the Applied team put in the bid is:
\[ P(\text{Contract} = \text{win} | \text{Team} = \text{Applied}) = \left( \frac{56}{70} \times \frac{70}{100} \right) + \left( \frac{6}{30} \times \frac{30}{100} \right) = 0.64 \]
What should the CEO do? The probability that the contract is won is higher if the Applied team puts in the bid, they also won more of the off-the-shelf contracts and more contacts overall. However, the Pure team has the higher success rate for the both the off-the-shelf and difficult contracts and focused their efforts on the difficult contracts. Depending on how the data is grouped, the CEO finds a different sales team is “the best”. Why is this?
Simpson’s paradox occurs when something hidden, that we’ve not seen or accounted for impacts the direction/movement of the data (variables) we’re looking at – that is they are not independent but rather explanatory. In this example, there are a wide range of factors that could impact upon the contracts each team decides to bid for and actually wins, for example: team culture, value of each contract; time and resources needed for each bid; time to close the bid; how many contracts need to be in the overall pipeline to ensure cash flow and so on. The CEO is focused on operational efficiency and the information they selected from the management dashboard only showed the counts of bids won and lost for the two different types of contracts.
Perhaps in this circumstance, the most prudent course of action is ice lollies for everyone.
Becky Warren – My favourite rep-tile
Becky Warren is a freelance maths educator and communicator. She loves neat mathematical puzzles, mathematical art and origami. Her occasionally updated blog is called Lines Curves Spirals. She’s @becky_k_warren on Twitter, where she loves joining in the #beingmathematical chat on Thursdays.
I was reminded recently of my favourite rep-tile. No, I’m not talking about lizards here I am talking about a shape that can be tiled onto an enlargement of itself. There are a number of these, for example every triangle, square, rectangle, parallelogram and rhombus has this property. Also there are a few trapeziums and some more complex shapes.
My favourite rep-tile is the L-triomino. So called because it is made of 3 unit squares in an L shape.
The simplest enlargement takes 4 L-triominoes and combines them to make a double sized L.
This rep-tile is quite special as you can make any integer enlargement of the original shape.
How many can you make?
HINTS:
You might like to start by looking at L’s with a size of $2^n$.
You could then look at L’s with a size of $3 \times 2^n$.
What sizes could you look at next?
I will leave you with a puzzle:
If you have an 8 by 8 chessboard that has had one corner square removed, is it possible to completely cover the rest with L-triominoes?
So, which bit of maths do you want to win? Vote now!
Match 26: Semi-final 2 - Sophie Carr vs Becky Warren
- Sophie with Simpson's paradox (72%, 165 Votes)
- Becky with rep-tiles (28%, 63 Votes)
Total Voters: 228
This poll is closed.
The poll closes at 9am BST on the 28th. Whoever wins the most votes will win the match and go through to the final.
Come back on Tuesday for the grand final!