estimateCardinality will panic when any input hashes to 0.
Additionally it will grossly overrepresent the cardinality if any inputs hash to a small integer, and given the hash determinism and the ease with which this can be achieved as a result, this function should probably not be used in production code or, indeed, in example code.
Sent too early, meant to add “or, indeed, in example code, absent a very large warning/disclaimer comment”.
Henrik Johansensays:
I think the 2^24 random float values in 0 – 1 is simpler to explain if you start with the uniformity requirement.
Each exponent will have uniformly distributed values, and the exponent with largest distance between values is -1, covering 0.5 – 1.
To stay uniform, you can only use that many values in 0 – 0.5, hence
2^23 * 2 = 2^24
Thanks for explaining
estimateCardinality will panic when any input hashes to 0.
Additionally it will grossly overrepresent the cardinality if any inputs hash to a small integer, and given the hash determinism and the ease with which this can be achieved as a result, this function should probably not be used in production code or, indeed, in example code.
Sent too early, meant to add “or, indeed, in example code, absent a very large warning/disclaimer comment”.
I think the 2^24 random float values in 0 – 1 is simpler to explain if you start with the uniformity requirement.
Each exponent will have uniformly distributed values, and the exponent with largest distance between values is -1, covering 0.5 – 1.
To stay uniform, you can only use that many values in 0 – 0.5, hence
2^23 * 2 = 2^24