The POSIX drand48() function returns a floating point number in [0, 1) with 48 bits of entropy. This is typically done by taking the IEEE 754 double representation of 1.0, setting the first 48 bit of the mantissa to random 48 bits and then subtracting 1.0. This leads to a uniformly distributed floating point number in this range.
To your claim about the 1-to-257 bias: Note that the range of valid floating point numbers around 0 is much denser than the range around 0.5, so even in a perfectly uniform distribution, hitting exactly 0.0 is much less probable than hitting 0.5 as the number of numbers that round to 0.0 is much higher than the amount of numbers that round to 0.5.
I believe the notion of a uniform distribution when applied to floating point numbers is a bit odd, which I think is what fuz is referring to. Since floating point numbers are denser around 0, if you generate a uniformly distributed real number and round that to the nearest representable floating point number, then by definition numbers around 0 will be more likely *if* your real number distribution was uniform and could generate an arbitrary real number (which dividing an integer by 2^24 can’t do).
The alternative route is to generate a uniformly distributed representable floating point number – which seems to imply that the goal is to have each floating point number in [0,1] have the same probability – which results in a non-uniform distribution, of course, and can be achieved by generating approximately 30 bits of randomness and filling mantissa/exponent with them (or, to be more precise, a random number in 0..1,056,964,610).
J.S. Nelsonsays:
That’s an interesting distinction to make. In most applications I can imagine one would want the latter thing, where we’re trying to uniformly pick out numbers in a finite set that happens to be the set of floating point numbers in [0,1], without regard to whether that distribution approximates a uniform distribution over the real range that contains them. E.g. if we’re using these numbers for cryptographic purposes, any non-uniform probability in terms of which actual point numbers our process selects is going to result in an entropy loss and will generally weaken our process.
I’m sure there are applications where you’d really want the former thing (a distribution over the floating point numbers in [0,1] that approximates a uniform distribution over the reals in that range) but my brain is tuned to information theory and I can’t think of one off the top of my head. I’d be interested in hearing a simple example if anyone has one.
Andrewsays:
If choosing a value in [0-e, 0+e] is as likely as [0.5-e, 0.5+e] for reasonable values of e, you have a uniform distribution. Probability of hitting the exact values is affected by the fact that each “exact value” is an approximation for a different interval. With the real real numbers, the probability of hitting any exact value is exactly zero, so with floats the probability of hitting an exact value only reflects floats’ failure to be really reals.
Severin Pappadeuxsays:
Any method which populate [1,2) and then subtract 1 to get U[0,1) will inevitable loose some random bits – there are twice as much floats in [0,1) as in [1,2)
Sebastiano Vignasays:
That’s entirely false. Try
static inline double to_double(uint32_t x) {
const union { uint32_t i; float f; } u = { .i = UINT32_C(0x3F8) << 20 | x };
return u.f – 1.0f;
}
int main() {
for(int i = 0; i < 1 << 23; i++) assert(to_double(i) * (1 << 23) == i);
}
“meaning that for every integer in [0,2^24), there is one and only one integer in [0,1).”
Should that read “one and only one floating point number in [0,1)” instead?
Andrew Dalkesays:
I used nextafterf() to count the number of floats between 0.0 and 1.0. The following program takes only a couple seconds to report that there are 1,065,353,216 such values, which is slightly larger than 1,056,964,608. I don’t know why there is a difference.
int main(void) {
float x = 0.0f;
int count = 0;
while (x < 1.0) {
x = nextafterf(x, 1.0);
count += 1;
}
My guess is that the extra floating point numbers that you detected are denormalized representations which don’t fit the patterns originally presupposed in the article.
Floating point representations are more complex than they appear.
Mike Asays:
One reason the count could be wrong is that the post only counts “normal numbers”, but there are a extra values between the smallest positive non-zero normal value and zero — the so called “denormal numbers”. The correct calculation should have been $127 x 2^23$. ($126 x 2^23$ normals, 1 zero and $1 x 2^23 -1$ denormals). Which matches your calculation exactly.
That’s correct. The mean is 0.0119 as you can verify with the following program…
intmain(){double total =0;size_t count =0;float val =0;for(uint32_t x =0; x <0xFFFFFFFF; x++){memcpy(&val,&x,sizeof(x));if(isnormal(val)&&(val >=0)&&(val <=1.0)){
total += val;
count +=1;}}printf("%f %zu \n", total, count);}
It is very far from 0.5. Of course, you can find a uniform sample of the numbers in [0,1)… but there are many more floats near 0…
For a true uniform number (32 bit float) half the time the exponent should be 126, a quarter of the time 125, one eighth of the time 124 etc. You can force that. A lowest set bit or highest set bit on say a random 32 or 64 bit integer will give the right distribution to add/subtract from the exponent. Then just put random bits in the mantissa.
I guess the error is “just putting random bits in the mantissa.”
That would probably lead to bias. If you look at the methods in the code by Vigna ( a famous person, by the way) there are 2 ways given to get floats.
Anyway it depends on what you want random numbers for. For evolutionary algorithms it is better to make full use of all the precision available even at the cost of some slight bias. Also for rare event code a uniform random number quantized in 2^24 pieces would be useless. You would never get a number in the interval [1e-26,1e-27] for example . That is a major bias in that application.
Actually there are an unlimited number of floating point numbers if we get away from the implementation dependent hardware straightjacket.
Just one example : take the sequence of 0.1, 0.01, …, 0.00001 and so on, Yes, that is an endless progression. Any questions ?
Guestsays:
The approaches that use division (which is a very complex operation in itself) and convert random integers to floating point seem cumbersome to me. Is it possible to work something out from the bit representation?
For example, something like this. Draw a random bit, if it equals one, then the exponent is 2^-1 which corresponds to the interval [1/2 .. 1). Otherwise draw another bit, if it equals one the exponent is 2^-2 which corresponds to the interval [1/4 .. 1/2) and so on. This gives us uniformity. Once we have chosen the interval, just fill the mantissa with 23 random bits. Such a generator can certainly produce all possible floating point numbers. It’s also easy to generate denormals this way, if necessary.
A bit-by-bit approach is unlikely to be efficient in hardware.
Guestsays:
Hello! But it’s bit-by-bit only in concept. In reality that’d be very approximately something like:
e = ctz(random(2^126))
m = random(2^23)
Where e is a bit representation of the exponent (so it should be in the [1, 126] interval, and ctz gives you an index of the least-significant non-zero bit.
In practice that would probably be 2-4 calls to the base random generator and CTZ/CLZ instructions. Surely the division is much more expensive than that! I’m more worried about proving that this approach will give a “good” distribution.
The POSIX drand48() function returns a floating point number in [0, 1) with 48 bits of entropy. This is typically done by taking the IEEE 754 double representation of 1.0, setting the first 48 bit of the mantissa to random 48 bits and then subtracting 1.0. This leads to a uniformly distributed floating point number in this range.
To your claim about the 1-to-257 bias: Note that the range of valid floating point numbers around 0 is much denser than the range around 0.5, so even in a perfectly uniform distribution, hitting exactly 0.0 is much less probable than hitting 0.5 as the number of numbers that round to 0.0 is much higher than the amount of numbers that round to 0.5.
If hitting exactly zero is much less probable than hitting exactly 0.5, then I would argue that you do not have a uniform distribution…
I believe the notion of a uniform distribution when applied to floating point numbers is a bit odd, which I think is what fuz is referring to. Since floating point numbers are denser around 0, if you generate a uniformly distributed real number and round that to the nearest representable floating point number, then by definition numbers around 0 will be more likely *if* your real number distribution was uniform and could generate an arbitrary real number (which dividing an integer by 2^24 can’t do).
The alternative route is to generate a uniformly distributed representable floating point number – which seems to imply that the goal is to have each floating point number in [0,1] have the same probability – which results in a non-uniform distribution, of course, and can be achieved by generating approximately 30 bits of randomness and filling mantissa/exponent with them (or, to be more precise, a random number in 0..1,056,964,610).
That’s an interesting distinction to make. In most applications I can imagine one would want the latter thing, where we’re trying to uniformly pick out numbers in a finite set that happens to be the set of floating point numbers in [0,1], without regard to whether that distribution approximates a uniform distribution over the real range that contains them. E.g. if we’re using these numbers for cryptographic purposes, any non-uniform probability in terms of which actual point numbers our process selects is going to result in an entropy loss and will generally weaken our process.
I’m sure there are applications where you’d really want the former thing (a distribution over the floating point numbers in [0,1] that approximates a uniform distribution over the reals in that range) but my brain is tuned to information theory and I can’t think of one off the top of my head. I’d be interested in hearing a simple example if anyone has one.
If choosing a value in [0-e, 0+e] is as likely as [0.5-e, 0.5+e] for reasonable values of e, you have a uniform distribution. Probability of hitting the exact values is affected by the fact that each “exact value” is an approximation for a different interval. With the real real numbers, the probability of hitting any exact value is exactly zero, so with floats the probability of hitting an exact value only reflects floats’ failure to be really reals.
Any method which populate [1,2) and then subtract 1 to get U[0,1) will inevitable loose some random bits – there are twice as much floats in [0,1) as in [1,2)
That’s entirely false. Try
static inline double to_double(uint32_t x) {
const union { uint32_t i; float f; } u = { .i = UINT32_C(0x3F8) << 20 | x };
return u.f – 1.0f;
}
int main() {
for(int i = 0; i < 1 << 23; i++) assert(to_double(i) * (1 << 23) == i);
}
Did you read http://xoroshiro.di.unimi.it/random_real.c ?
No, but it looks like an interesting reference.
Actually, you did! It’s the code linked by your reference http://mumble.net/~campbell/2014/04/28/uniform-random-float above…
“meaning that for every integer in [0,2^24), there is one and only one integer in [0,1).”
Should that read “one and only one floating point number in [0,1)” instead?
I used nextafterf() to count the number of floats between 0.0 and 1.0. The following program takes only a couple seconds to report that there are 1,065,353,216 such values, which is slightly larger than 1,056,964,608. I don’t know why there is a difference.
int main(void) {
float x = 0.0f;
int count = 0;
while (x < 1.0) {
x = nextafterf(x, 1.0);
count += 1;
}
printf("count: %d\n", count);
return 0;
}
My guess is that the extra floating point numbers that you detected are denormalized representations which don’t fit the patterns originally presupposed in the article.
Floating point representations are more complex than they appear.
One reason the count could be wrong is that the post only counts “normal numbers”, but there are a extra values between the smallest positive non-zero normal value and zero — the so called “denormal numbers”. The correct calculation should have been $127 x 2^23$. ($126 x 2^23$ normals, 1 zero and $1 x 2^23 -1$ denormals). Which matches your calculation exactly.
The correct calculation should have been (…)
I excluded explicitly denormal numbers. It is debatable whether they should be included.
In any case, it does not change the count very much.
What’s the mean of all the floating-point numbers in [0,1] ? I would assume it’s nowhere near 0.5
That’s correct. The mean is 0.0119 as you can verify with the following program…
It is very far from 0.5. Of course, you can find a uniform sample of the numbers in [0,1)… but there are many more floats near 0…
For a true uniform number (32 bit float) half the time the exponent should be 126, a quarter of the time 125, one eighth of the time 124 etc. You can force that. A lowest set bit or highest set bit on say a random 32 or 64 bit integer will give the right distribution to add/subtract from the exponent. Then just put random bits in the mantissa.
I guess the error is “just putting random bits in the mantissa.”
That would probably lead to bias. If you look at the methods in the code by Vigna ( a famous person, by the way) there are 2 ways given to get floats.
Anyway it depends on what you want random numbers for. For evolutionary algorithms it is better to make full use of all the precision available even at the cost of some slight bias. Also for rare event code a uniform random number quantized in 2^24 pieces would be useless. You would never get a number in the interval [1e-26,1e-27] for example . That is a major bias in that application.
I meant to mention about a different number format that has a less crazy density of low magnitude numbers:
https://en.wikipedia.org/wiki/Unum_%28number_format%29
Actually there are an unlimited number of floating point numbers if we get away from the implementation dependent hardware straightjacket.
Just one example : take the sequence of 0.1, 0.01, …, 0.00001 and so on, Yes, that is an endless progression. Any questions ?
The approaches that use division (which is a very complex operation in itself) and convert random integers to floating point seem cumbersome to me. Is it possible to work something out from the bit representation?
For example, something like this. Draw a random bit, if it equals one, then the exponent is 2^-1 which corresponds to the interval [1/2 .. 1). Otherwise draw another bit, if it equals one the exponent is 2^-2 which corresponds to the interval [1/4 .. 1/2) and so on. This gives us uniformity. Once we have chosen the interval, just fill the mantissa with 23 random bits. Such a generator can certainly produce all possible floating point numbers. It’s also easy to generate denormals this way, if necessary.
Will something like this work?
A bit-by-bit approach is unlikely to be efficient in hardware.
Hello! But it’s bit-by-bit only in concept. In reality that’d be very approximately something like:
e = ctz(random(2^126))
m = random(2^23)
Where
e
is a bit representation of the exponent (so it should be in the [1, 126] interval, andctz
gives you an index of the least-significant non-zero bit.In practice that would probably be 2-4 calls to the base random generator and CTZ/CLZ instructions. Surely the division is much more expensive than that! I’m more worried about proving that this approach will give a “good” distribution.
This is basically the same approach taken by the earlier linked http://xoroshiro.di.unimi.it/random_real.c . It gives a complete treatment of how & why this works.