28th February 2017, 4 min read

How many floating-point numbers are in the interval [0,1]?

Most commodity processors support single-precision IEEE 754 floating-point numbers. Though they are ubiquitous, they are often misunderstood.

One of my readers left a comment suggesting that picking an integer in [0,2³²) at random and dividing it by 2³², was equivalent to picking a number at random in [0,1). I am not assuming that the reader in question made this inference, but it is a tempting one.

That’s certainly “approximately true”, but we are making an error when doing so. How much of an error?

Floating-point numbers are represented as sign bit, a mantissa and an exponent as follows:

There is a single sign bit. Because we only care about positive number, this bit is fixed and can be ignored.
The mantissa is straight-forward: 23 bits. It is implicitly preceded by the number 1.
There are eight bits dedicated to the exponent. It is pretty much straight-forward, as long as the exponents range from -126 to 127. Otherwise, you get funny things like infinity, not-a-number, denormal numbers… and zero. To represent zero, you need an exponent value of -127 and zero mantissa.

So how many “normal” non-zero numbers are there between 0 and 1? The negative exponents range from -1 all the way to -126. In each case, we have 2²³ distinct floating-point numbers because the mantissa is made of 23 bits. So we have 126 x 2²³ normal floating-point numbers in [0,1). If you don’t have a calculator handy, that’s 1,056,964,608. If we want to add the number 1, that’s 126 x 2²³ + 1 or 1,056,964,609.

Most people would consider zero to be a “normal number”, so maybe you want to add it too. Let us make it 1,056,964,610.

There are 4,294,967,296 possible 32-bit words, so about a quarter of them are in the interval [0,1]. Isn’t that interesting? Of all the float-pointing point numbers your computer can represent, a quarter of them lie in [0,1]. By extension, half of the floating-point numbers are in the interval [-1,1].

So already we can see that we are likely in trouble. The number 2³² is not divisible by 1,056,964,610, so we can’t take a 32-bit non-negative integer, divide it by 2³² and hope that this will generate a number in [0,1] in an unbiased way.

How much of a bias is there? We have a unique way of generating the zero number. Meanwhile, there are 257 different ways to generate 0.5: any number in between 2,147,483,584 and 2,147,483,776 (inclusively) gives you 0.5 when divided by the floating-point number 2³².

A ratio of 1 to 257 is a fairly sizeable bias. So chances are good that your standard library does not generate random numbers in [0,1] in this manner.

How could you get an unbiased map?

We can use the fact that the mantissa uses 23 bits. This means in particular that you pick any integer in [0,2²⁴), and divide it by 2²⁴, then you can recover your original integer by multiplying the result again by 2²⁴. This works with 2²⁴ but not with 2²⁵ or any other larger number.

So you can pick a random integer in [0,2²⁴), divide it by 2²⁴ and you will get a random number in [0,1) without bias… meaning that for every integer in [0,2²⁴), there is one and only one number in [0,1). Moreover, the distribution is uniform in the sense that the possible floating-point numbers are evenly spaced (the distance between them is a flat 2^-24).

So even though single-precision floating-point numbers use 32-bit words, and even though your computer can represent about 2³⁰ distinct and normal floating-point numbers in [0,1), chances are good that your random generator only produces 2²⁴ distinct floating-point numbers in the interval [0,1).

For most purposes, that is quite fine. But it could trip you up. A common way to generate random integers in an interval [0,N) is to first generate a random floating-point number and then multiply the result by N. That is going to be fine as long as N is small compared to 2²⁴, but should N exceeds 2²⁴, then you have a significant bias as you are unable to generate all integers in the interval [0,N).

I did my analysis using 32-bit words, but it is not hard to repeat it for 64-bit words and come to similar conclusions. Instead of generating 2²⁴ distinct floating-point numbers in the interval [0,1], you would generate 2⁵³.

Further reading: The nextafter function and Uniform random floats: How to generate a double-precision floating-point number in [0, 1] uniformly at random given a uniform random source of bits.