A.S.: Yep, apparently the singular of ‘dice’ is ‘die’. 😉
A tricky programming task has been floating around the Internet which, as you might have guessed from my attempt at a pragmatic title, can be formalized as follows:
Given a function that returns a uniformly random number between 1 and 5 (inclusive), design a function that returns a uniformly random number between 1 and 7 (inclusive).
If the intersection of algorithms and statistics are your kind of thing, feel free to give it a go! An extra challenge is to make your algorithm bounded in time. I found this to be a fun exercise, only to realize that there are fundamental limitations at play here.
Of course, a die here means a perfectly uniform random generator that exists in theory only. Real-world dice are governed by natural laws and environmental imperfections in such a way that they are only an approximation (albeit a good one) of this abstract model.
An unbounded solution
Let’s look at a family of unbounded algorithms. In order to create a random generator that has at least 7 outcomes, we will need to roll the 5-sided die at least twice. Rolling it twice creates 5^2=25 possible outcomes, and while this is not a factor of 7, we can assign the first 21 outcomes to 7 groups of equal size 3. If we end up at one of the remaining 4 outcomes however, then the procedure must restart. We cannot simply assign any of these 4 unlucky end states to an output, as this will mess up the uniform randomness property.
In the above chart, the circles represent ‘states’ of our program, and the text within is the most recent value of the rolled die. The green boxes then represent the output we assign depending on the end state (= after rolling the die twice). 84% of the time, we get an output after two rolls and the algorithm finishes. As you can see, a loop is present and makes the algorithm unbounded. This means that for every integer, no matter how large, there is a small but non-zero probability that the total number of rolls will exceed that amount in order to calculate one output value. Real-time applications with strict time requirements are therefore out of the question, unless you are willing to make compromises on the algorithm’s statistical accuracy.
A variant of this solution initially rolls up to three dice before restarting: this creates 5^3=125 end states, and we can assign at most 119 of them to 7 groups of equal size 17. Now, there is a 95.2% probability of finishing after a maximum of three rolls. Note that we have to roll fewer times on average than before (280/119 ≈ 2.35 versus 50/21 ≈ 2.38), because most of the time we can already take a shortcut after the first two rolls, and directly map these in-between states to an output. This average amount of rolls keeps dropping as we continue the trend.
If you are familiar with information theory, then consider the fact that one roll of 5-sided die (= rand5) generates log2(5) ≈ 2.32 bits of information, and one roll of a 7-sided die (= rand7) generates log2(7) ≈ 2.81 bits of information. It turns out that in the limiting case, we need just log(7)/log(5) ≈ 1.21 calls of rand5 for each rand7 value we wish to obtain. The exact implementation that works arbitrarily close to this limit is rather complicated and out of the scope of this article. We would need to store state information in-between multiple calls of rand7, and the algorithm I have in mind is not only still unbounded in time, but also in memory. I gladly invite you to inform me of a method that solves this latter issue!
A bounded solution
There exists no such thing as a bounded algorithm that calculates a perfectly uniform random value between 1 and 7, given a random generator between 1 and 5 only. Consider the family of unbounded solutions above. As I have hinted, at every point there will always be an amount of ‘left-over’ states (ranging anywhere from 1 to 6) that cause the algorithm to restart. This is because 5 and 7 are prime numbers, and no power of 5 can ever be a multiple of 7. However tempting it may be to use Fermat’s little theorem, such attempts will result in off-by-one errors, since basic number theory renders a bounded solution in this family impossible.
You might rightly point out that the number of end states is not always a power of 5, for example in the case of ‘up to three rolls’ where we can take shortcuts after two rolls already. Perhaps we could arrange the state chart as to create a bounded algorithm with variable-length paths and 7n end states? However, consider the following:
- These end states will not be equally likely because not every path has the same length (= amount of rand5 calls).
- We can very easily expand the tree to make every path equally long, by uselessly calling rand5 an appropriate amount of times whenever we are at a premature end state. Now the amount of end states becomes a power of 5 again, and all end states are equally likely.
Of course, this is not a formal proof, but I hope to have given you some intuition about the problem. As we see, sometimes a problem may seem very realistic, only to stumble upon mathematical facts like these. After this analysis, can you figure out whether it is possible to simulate an 8-sided die starting from a 4-sided die?