Some libraries such as Facebook AI FAISS use single precision, so they only have 7 digits.
When working with squared values, e.g., when computing variance, covariance, PCA, or kmeans, going back from squares to linear values tends to cut this precision in half. So 4 digits for single precision, 8 for double.
Try computing the variance of 10000, 10002 in some of these tools… or of 1000000 and 1000000.02 in some SQL databses.
The variance of 100000000 and 100000002 is a problem with 32-bit floating-point numbers, but I think that the variance of 10000 and 10002 is fine.
Can you elaborate ?
George Spelvinsays:
Actually, squared values are an exponent problem, not a mantissa problem. Squaring can be thought of (very approximately) as moving the msbit of the mantissa to the lsbit of the exponent.
So any two distinct floats have distinct squares, if there is no over/underflow This is not true in reverse; on average each float has one other which shares the same square root.
So this mean that an algorithm which computes an accurate variance loses one additional lsbit when converting to standard deviation, but that’s not horrible.
The problem with variance is cancellation; the naive algorithms require a large amount of excess precision. But if you have a good estimate of the mean (e.g. using a first pass), then the two-pass compensated algorithm produces an excellent variance without needing extra-precision temporaries.
Erich Schubertsays:
Several libraries still use the naive approach when computing such values!
E.g. FAISS for PCA (and PCA based indexing…)
E.g., sklearn when computing Euclidean distance using sqrt(a²+b²-2ab) – but they always use double precision here to lessen the effects for single precision input data.
Some libraries such as Facebook AI FAISS use single precision, so they only have 7 digits.
When working with squared values, e.g., when computing variance, covariance, PCA, or kmeans, going back from squares to linear values tends to cut this precision in half. So 4 digits for single precision, 8 for double.
Try computing the variance of 10000, 10002 in some of these tools… or of 1000000 and 1000000.02 in some SQL databses.
The variance of 100000000 and 100000002 is a problem with 32-bit floating-point numbers, but I think that the variance of 10000 and 10002 is fine.
Can you elaborate ?
Actually, squared values are an exponent problem, not a mantissa problem. Squaring can be thought of (very approximately) as moving the msbit of the mantissa to the lsbit of the exponent.
So any two distinct floats have distinct squares, if there is no over/underflow This is not true in reverse; on average each float has one other which shares the same square root.
So this mean that an algorithm which computes an accurate variance loses one additional lsbit when converting to standard deviation, but that’s not horrible.
The problem with variance is cancellation; the naive algorithms require a large amount of excess precision. But if you have a good estimate of the mean (e.g. using a first pass), then the two-pass compensated algorithm produces an excellent variance without needing extra-precision temporaries.
Several libraries still use the naive approach when computing such values!
E.g. FAISS for PCA (and PCA based indexing…)
E.g., sklearn when computing Euclidean distance using sqrt(a²+b²-2ab) – but they always use double precision here to lessen the effects for single precision input data.