A few days ago on this blog, in Binomial distribution and Mega-Millions Lottery ROI, I wrote about statistically predicting the number of winners in a lottery, and the impact this has on the mean return-on-investment of a lottery ticket purchase. Now that the outcome of last week’s record-setting Mega-Millions drawing is known, we have an opportunity to compare how the actual outcome compared to the prediction made by the binomial distribution model.
A news article by Margery A. Beck of the Associated Press claims that 1.46 billion tickets were sold in the run-up to last week’s drawing. The article was cited by several major newspapers, including the Los Angeles Times. If that number is credible, it is a spectacular two-day increase from the 50M tickets that had been sold as of Tuesday according to an article in Wednesday’s San Jose Mercury News. It far exceeds the upper end of 250M that I used in my analysis last week. However, the post-hoc analysis that follows suggests that the number was more likely around 540 million, just over a third of the Associated Press’s estimate.
Using Associated Press’s estimate, the
Binomial( 1.46B, 1/176M) distribution looks like:
This graph shows that there was a 14% chance that the grand prize would have had to be split 8 ways, an 8% chance of splitting it 11 ways, etc.
It has now been reported that there were three winning tickets. So if the 1.46B tickets sold is really a credible number, these winners have beaten the odds in more ways than one! There was only a 3.5% chance of having 3 or fewer winners, so those who did win don’t have to split it in 8 or more ways as they should have expected to. The binomial model predicts that it would have been equally likely that the prize would have had to be split 14 or more ways.
Because the actual outcome is so unlikely according to the binomial model, we should question whether this reported estimate for the number of tickets sold is realistic, or whether the binomial model itself is flawed. It is entirely possible that both are fine, and that a rare event really did occur, but the value of a post-hoc analysis is that it often uncovers deviations from the underlying assumptions, and that itself can lead to additional insight.
As tickets are sold, some percentage of each ticket is added to the grand prize total. An article by Marc McAfee and Jon Shirek of WXIA in Atlanta states, in the context of the Mega-Millions drawing, that
“Out of every $1 spent on a lottery ticket in Georgia, most of it goes toward the prize money; roughly 60 cents out of every dollar becomes part of the jackpot.”
If so, and if $1.49B were spent in the final days, then we should have seen the lump-sum grand prize total increase by about $894M. In fact, it increased by about $200M from Tuesday to Friday. I haven’t been able to find a trustworthy accounting of how the sales of each ticket is allocated to each week’s jackpot, but this lower-than-expected increase suggests inconsistencies in the numbers being reported.
Another post-hoc validation of the binomial model can be obtained by looking at how many people won the second-tier prize — the number of tickets that matched 5 of 5 white numbers but not the mega number, at odds of 1 in 3,904,701. This second tier prize has a $250K payout. The Los Angeles times reports that 300M tickets were sold in California alone, and that 29 people in California hit this second tier prize. The
Binomial(300M, 1/3.9M) distribution is:
The distribution suggests that the odds of only 29 people hitting the second tier prize if 300M tickets were indeed sold is unthinkably implausible — about 1 chance in 30M that fewer than 30 people would have a 5+0 match. In this case, we cannot accept that a statistical anomaly occurred.
The first possible way to resolve these discrepancies is to challenge the numbers reported in the news regarding the actual numbers of tickets sold. If these news reports were overestimating by a factor of about 2.8, then the binomial model and reported numbers of winners are in excellent agreement. Try it for yourself: In Analytica view the probability mass result for
Binomial( 1.5B / 2.8, 1/176M ) and
Binomial( 300M/2.8, 1/3.9M ).
The second possible resolution is that the assumptions behind the binomial model do not hold. The binomial distribution assumes that all possible number combinations are equally likely, but also that the number combinations that people play are uniformly distributed. There is little reason to doubt the first assumption, but if people tend to play dates and regular sequences, it may be that ticket patterns are highly non-uniform. Accounting for such a non-uniformity would be an interesting modeling exercise. There is no doubt that some degree of non-uniformity is present, but I have a hard time believing it would be substantial enough to account for the discrepancies seen here.
My opinion is that the news overestimated the number of tickets sold. While it is not uncommon for poor estimates to appear in the media, detecting poor estimates is not always easy. This post-hoc analysis serves as an example in which a bit of simple quantitative modeling provides the tool needed for spotting bad information in the media, even when it is so widely reported. Applying the binomial model in reverse, we find that it is more plausible that about 540M tickets were sold in the run-up to Friday’s lottery, much less than the 1.5B purportedly sold as reported by the Associated Press.