A Mathematical Understanding of the Powerball Lottery
December 27, 2018
The popular Powerball lottery offered in the United States has in the past years, evolved to accommodate larger payouts due to several iterative adjustments in the game’s structure. The most recent adjustment was set in place for the drawing which occurred on October 7th, 2015. This probabilistic alteration opened the opportunity for players to potentially win payouts in excess of $1 billion.
This post is the first in a series and seeks to understand how to calculate and comprehend the underlying probabilities associated with the Powerball lottery game. If you prefer to computationally walk through the blog post, I will place code snippets within this article for you to run on your own I will be using Python 3.7 for this project. Access the associated Jupyter Notebook for the project here.
Brief History of the Powerball Lottery Game
Formed through a collaborative effort between the Multi-State Lottery Association (a nonprofit organization), and an agreement with the other US lotteries, the Powerball came into fruition. As mentioned in the name of the organization that set it up, the Powerball lottery is a Multi-State lottery game offered for to players across 44 States in the US.
With the efforts of the Multi-State Lottery Association, the first Powerball drawing was held in 1992. Unlike the other lottery game structures of the time, the Powerball became the first lottery game to use two different pools of balls from which the winning numbers were selected from. This was especially important for the Multi-State Lottery Association, because this two-pool system offered more variation in the numbers for players to select from, implicitly leading to high jackpot odds.
The Game Structure
For those who are not familiar with how to play the Powerball lottery, the game structure is very simple. There is one player who chooses two sets of numbers from two different pools. One can think of a pool as consisting of a range from 1 to N, where the range implies that for every integer x within it is greater than or equal to 1, and less than or equal to N.
The players can select their two different sets of numbers by using the “Quick Pick” option, which allows a machine to choose random values for their ticket, or the players can manually create their own selection matrix.
Powerball Ticket Combinations
For the first pool, or the white balls in the Powerball, N = 69 making the range of possible selections a range from 1 to 69. The player must select five different numbers from the first pool.
These five numbers, regardless of their order, must match the five numbers drawn from the FirstPool by the lottery in order for a chance to win the entire Jackpot. In the game play, when the first number is drawn it has a 1 in 69 possibility of matching a ball drawn in the lottery, this leads to the second number being drawn from a pool of 68 numbers, or a 1 in 68 possibility of matching a ball in the lottery. This type of combination is considered a combination without replacement. When continuing this logic forward across all the five numbers necessary we arrive at, (1 in 69 x 68 x 67 x 66 x 65) possibilities.
This expression can be mathematically articulated using enumerative combinatorics. Combinatorics broadly is a branch of mathematics whose primary concern is the utilization of certain methodological processes as a means to count, and to extrapolate results from a finite set. When narrowing in on enumerative combinatorics we use the twelvefold way, a framework for counting permutationsPermutation: An ordinal or sequential arrangement of the components of a set. , combinations Combination: The arrangement of a set without reference to the order of the components. and partitions. We will use this use this to define the range of possible combinations in the Powerball lottery.
We can think of the range of possible numbers to be drawn as an immutable set S of integers whose bounds are defined in the aforementioned FirstPool definition. A k-combination is a finite portion of set S with k number of distinct elements. If the range of the set is known, n is the inclusive maximum element.
For the Powerball’s FirstPool we can define n as equal to 69, and k as equal to 5. I will acknowledge that factorials are computationally slow, but here we will use them for simple explanation’s sake. Using previously defined n, and k within equation below we will arrive at the total number of combinations including permutations within the Powerball lottery.
The computational expression of the binomial coefficient calculation expresses the optionality to include or exclude permutations.
from scipy.special import factorial def binomial_coefficient(n,k,perm=False): '''Calculates the Binomial Coefficient (n choose k) Keyword Arguments: n = denotes the maximum value in a range from (1,n) k = denotes the number of values selected from the n range perm = denotes whether permutations are included ''' if perm is True: binomial_coefficient = round((factorial(n)) / factorial(n-k)) else: binomial_coefficient = round((factorial(n)) / (factorial(k) * factorial(n-k))) return binomial_coefficient
Using this function we can calculate that number of possible combinations and permutations in the FirstPool of the Powerball Lottery.
This leads to 1,348,621,560 combinations and permutations. Restating once more, that in order to win a prize from the Powerball lottery the order of your selection of numbers is irrelevant, and thus we can eliminate the permutations. We eliminate the permutations by multiplying the denominator by k!. This is accomplished by default with the perm parameter of the binomial_coefficient function. We do this because given any set of k numbers there are k! number of ways that k could have been drawn. Thus, when we divide the 1,348,621,560 by k! we eliminate the effect of the permutations. This factorial expression of the k-combinatorial calculation in Eq.2 is also equivalent to the binomial coefficient. Throughout the remainder of this article we will syntactically use the binomial coefficient notation
Using this function we can calculate that number of possible combinations in the FirstPool of the Powerball Lottery.
This leads to 11,238,513 combinations in the Powerball lottery’s first five numbers. The binomial coefficient can also be thought of as the collection of probabilities. When thinking about the Powerball, the probabilities associated with the first number would be 5 out of 69, the next being 4 out of 68, and so on until it renders the equation below.
When considering the Powerball, simply calculating the number of combinations across the first pool is not enough. There is a second pool where a red ball is drawn from. The red ball is the Powerball The number of balls in the second pool is 26, making N = 26 with the range of possible selections being a range from 1 to 26. The player must select just one number from the second pool.
In order to calculate the number of combinations including the second pool we modify the aforementioned probabilistic equation slightly.
With this addition of the second pool the number of combinations increases to 292,201,338 possible combinations. We can refer to the span of all of the different combinations as the number space. The ratio of the number of the combinations in play for one drawing relative to the number space is denoted as the coverage in a lottery.
The Odds of Winning
Having discussed how to calculate the possible combinations in the Powerball, let us end by discussing the odds of winning. For now let us just consider the five white balls in the Firstpool, and we will discuss the inclusion of the Powerball momentarily. We start with the total number of combinations that are possible if all of the five white balls are selected.
After that, we will define an expression that describes the odds of choosing n winning numbers out of the 5 possible winning white balls of the Powerball.
From this expression we can also think about the inverse of the odds of choosing n odds. They would imply that there are 5-n chances to choose a loosing number from the 69-k, or 64 losing numbers.
Taking both the odds of choosing n winning numbers multiplied by the odds of choosing n loosing numbers, divided by the odds of choosing all of the winning numbers, we arrive at the odds of choosing n correct numbers.
In our example it is used where N is the number of balls in a single pool, K is the number of balls in a selection from a single pool, and B is the number of matching balls on a single draw.The generalized equation is known as the hypergeometric distribution.
This equation would work perfectly except for one final caveat. That being that the Powerball Lottery has two pools which balls are selected from. Just as we did before, the rectification of the equation to accommodate two pools is not difficult. Having only one ball selected from the pool, we can take the probability of it getting selected, multiplied across the numerator of the hypergeometric distribution. With this modification it alters the equation to allow for the selection of 1 ball out of pool P, along with the other aforementioned variables.
To calculate the the odds of selecting correctly from the FirstPool, but incorrectly from the SecondPool, we again modify the equation.
Computationally, we can derive each of the above equations using the powerball_odds function.
def powerball_odds(n,k,pmax=0): '''Calcluates the range of odds of drawing X correct balls in a non-powerball or one-powerball style lottery. Keyword Arguments: n: The total number of balls in the lottery's drawing k: The total number of balls that a player can draw out of the n pool pmax:(optional): The maximum value if there is a powerball drawn from a seperate pool. ''' global odds_list global odds_list_PB odds_list_PB =  odds_list =  m = n-k for i in range(k+1): f = k-i odds_a = binomial_coefficient(k,i) odds_b = binomial_coefficient(m,f) odds_c = binomial_coefficient(n,k) if pmax == 0: odds = 1/((odds_a * odds_b)/odds_c) odds_list.append(odds) else: odds_d = 1/((odds_a * odds_b)/odds_c) odds_PB = odds_d * pmax odds_wo_PB = odds_d * pmax / (pmax - 1) odds_list_PB.append(odds_PB) odds_list.append(odds_wo_PB) return odds_list, odds_list_PB
The code snippet above if ran arrives at two different outputs. Which are two lists of the odds for each type of ticket combination. If you are running on the associated Jupyter notebook, follow the cleaning steps and you will arrive at the correct DataFrames that resemble the following tables.
Table 1: The output of the odds for the current Powerball Lottery
|Output 1: N=69 | P=26|
Prior to October 7th, 2015, the Powerball lottery had the FirstPool ranging only to 59, and had the SecondPool ranging to 35. The output will allow you to visualize how the change in pool sizes changed the odds of the game and has helped to produce several enormous Powerball payouts!
Table 2: The output of the odds for the system that was in place for the Powerball Lottery from January 15, 2012 – October 3, 2015
|Output 1: N=59 | P=35|
A Quick Comparison between the to tables
When running through the associated Jupyter notebook we see that the Powerball’s change in pool size accomplished two things. It first improved the odds of having a winning ticket with the Powerball for tickets with 3 correct numbers or less. Secondly it decreased the odds of having a winning ticket with the Powerball for tickets with 4 or more correct numbers. For tickets without the correct Powerball the odds of winning either stayed the same or increased. This change has allowed the Powerball to increase the coverage of the lottery thus implicitly causing higher jackpot payouts.
This article covered how to use enumerative combinatorics to calculate the different number of combinations for each of the two pools in the Powerball Lottery, and how to calculate the different odds of winning for each type of ticket. Understanding the mathematics lays the foundation for the next article in Powerball series. Within that next article we will learn how to build a web scraper that gathers historical Powerball lottery data.
About the author
My name is Jeremy A. Seibert. I built the Urban Scientist as a site to host my explorations in the intersection between Economics and Data Science. Most of the posts that I write will have an associated Jupyter notebook accompanying them feel free to check those out at the Urban Scientist Github . Check out the about page for more info on my background.