It might be because the bitset is more cache friendly. I personally love the bitmagic C++ library (http://bmagic.sourceforge.net/) as it handles sparse sets really well.
You can also run the code and see for yourself what happens on your own machine. Results may vary.
BTW I am not saying that BitSet is bad. My StaticBitSet class just does better, it seems, on this particular test.
J. Andrew Rogerssays:
Daniel, this is entirely consistent with my experience. A well-implemented bit set is a nearly optimal data structure for modern CPU architectures. It is brute force but in a manner CPUs are highly optimized for. If you are saturating all the ALUs then the number of entry evaluations per clock cycle is very high. Hash sets have to overcome their cache line overhead and relatively low number of evaluations per clock cycle.
As you undoubtedly know, when bit sets become large/sparse enough to become inefficient, a very good alternative is compressed bit sets.
Of course. I’m just wondering since I have some extremely performance-sensitive code which uses java.util.BitSet — it’ll be nice to benchmark it there too, given those numbers….
to be more specific: do you think that one could combine bitset and hashset?
I mean a hashset which is baken by (linked or whatever) several bitsets?
Ivansays:
Hi,
would be nice to see some b search + sorted vector in comparison, like in your linked blog post. Even more interesting would be some cache aware or even (if they exist) cache oblivious “kind of b search + sorted vector”. I guess binary tree “flattened” to array would behave nicely with regards to cache perf.
I found the SparseArray/LongSparseArray of the android project. Still a *full* re-allocation is necessary if the space is not sufficient but the key/values are stored very compact. Access is done via binary search, so not O(1) …
See the com.ibm.wala.util.intset.IntSet interface and its implementations. Efficient representation of integer sets is critical to the scalability of many program analyses.
Here’s another approach: randomize the integers using a pseudorandom permutation (aka block cipher), and divide them into buckets indexed by the MSB of their randomized value. (The bucket size should be chosen to fit into a cache line in expectation.) Then to look up an integer, encode it and scan its bucket (linear search should suffice if the bucket fits in a cache line). To intersect two of these sets, intersect each corresponding bucket (sets of different sizes will have different bucket counts, so this requires intersecting each bucket of the smaller set with all buckets in the larger set of which that bucket’s index is a prefix). Again, this is just a linear scan on cache-line-sized buckets (if encoded values are stored in order within a bucket then you can pick up the scan where you left off from the last value you looked up). Note that no decoding is required for intersections if both sets share the same permutation (and you don’t want to enumerate the result). You can enumerate the whole set simply by scanning and decoding each bucket. I’ve been able to get around 60% compression this way with a few million 32-bit integers, and around 78% with a more complex version of the structure (using Elias-Fano coding in the buckets). For the permutation, any 2-round balanced Feistel network with a fast round function works fine; a couple of rounds of RC5 (too weak for crypto!) seems fast enough in practice.
Tobin Bakersays:
Heh, I forgot to mention the detail that achieves compression in this scheme: each bucket only stores the suffixes of the encoded value (the prefix is implicitly stored in the bucket index). I don’t have Knuth handy but I believe he refers to this technique as “quotienting”. Of course since buckets are variable-sized, an index storing bucket counts or offsets is necessary, but that requires very little space relative to the compression gain. (The distribution of bucket sizes is well described by the Poisson approximation to the binomial distribution.)
It might be because the bitset is more cache friendly. I personally love the bitmagic C++ library (http://bmagic.sourceforge.net/) as it handles sparse sets really well.
Have you got perf figures for that StaticBitSet class, compared to a BitSet?
@Justin
They are in the repository, right there:
https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2012/11/13/results.txt
You can also run the code and see for yourself what happens on your own machine. Results may vary.
BTW I am not saying that BitSet is bad. My StaticBitSet class just does better, it seems, on this particular test.
Daniel, this is entirely consistent with my experience. A well-implemented bit set is a nearly optimal data structure for modern CPU architectures. It is brute force but in a manner CPUs are highly optimized for. If you are saturating all the ALUs then the number of entry evaluations per clock cycle is very high. Hash sets have to overcome their cache line overhead and relatively low number of evaluations per clock cycle.
As you undoubtedly know, when bit sets become large/sparse enough to become inefficient, a very good alternative is compressed bit sets.
Of course. I’m just wondering since I have some extremely performance-sensitive code which uses java.util.BitSet — it’ll be nice to benchmark it there too, given those numbers….
nice results :)!
this seems to be a bit offtopic but 🙂
does somebody know of a sparse hashset implementation in Java which is more memory efficient than the THashSet?
I need some memory efficient datastructure for the case that there are areas of consecutive integer values … or how would you implement that?
to be more specific: do you think that one could combine bitset and hashset?
I mean a hashset which is baken by (linked or whatever) several bitsets?
Hi,
would be nice to see some b search + sorted vector in comparison, like in your linked blog post. Even more interesting would be some cache aware or even (if they exist) cache oblivious “kind of b search + sorted vector”. I guess binary tree “flattened” to array would behave nicely with regards to cache perf.
@Daniel
of course I’m aware of your nice projects 🙂
but I think compressed bitsets are not an option for me as I need random access.
@Justin
You are welcome to steal my StaticBitSet implementation and benchmark it for your purposes. It is available on github.
@Rogers @Ivan
Agreed. It would be interesting to throw in more data structures and more strategies. I will do so in the future.
@Daniel
Sorry for the confusion! I should have thought about my problem a bit more 🙂
I need a hashmap or a ‘compressed’ integer array which efficiently ‘maps’ ints to ints (or longs to longs)
@Peter
It is a reasonable request, but I don’t know of an existing solution.
You might want to look at compressed bitsets. They could meet your needs. See https://github.com/lemire/simplebitmapbenchmark for a comparative benchmark.
@Peter
Yes, I understand what you seek.
If you ever find a good solution, please email me.
I found the SparseArray/LongSparseArray of the android project. Still a *full* re-allocation is necessary if the space is not sufficient but the key/values are stored very compact. Access is done via binary search, so not O(1) …
Cool post! For those who are interested, we have a wide variety of integer set implementations in WALA:
http://wala.sourceforge.net
See the com.ibm.wala.util.intset.IntSet interface and its implementations. Efficient representation of integer sets is critical to the scalability of many program analyses.
Interesting post! http://www.censhare.com/en/aktuelles/censhare-labs/efficient-concurrent-long-set-and-map shows a different approach for sparse sets using a bit trie.
Here’s another approach: randomize the integers using a pseudorandom permutation (aka block cipher), and divide them into buckets indexed by the MSB of their randomized value. (The bucket size should be chosen to fit into a cache line in expectation.) Then to look up an integer, encode it and scan its bucket (linear search should suffice if the bucket fits in a cache line). To intersect two of these sets, intersect each corresponding bucket (sets of different sizes will have different bucket counts, so this requires intersecting each bucket of the smaller set with all buckets in the larger set of which that bucket’s index is a prefix). Again, this is just a linear scan on cache-line-sized buckets (if encoded values are stored in order within a bucket then you can pick up the scan where you left off from the last value you looked up). Note that no decoding is required for intersections if both sets share the same permutation (and you don’t want to enumerate the result). You can enumerate the whole set simply by scanning and decoding each bucket. I’ve been able to get around 60% compression this way with a few million 32-bit integers, and around 78% with a more complex version of the structure (using Elias-Fano coding in the buckets). For the permutation, any 2-round balanced Feistel network with a fast round function works fine; a couple of rounds of RC5 (too weak for crypto!) seems fast enough in practice.
Heh, I forgot to mention the detail that achieves compression in this scheme: each bucket only stores the suffixes of the encoded value (the prefix is implicitly stored in the bucket index). I don’t have Knuth handy but I believe he refers to this technique as “quotienting”. Of course since buckets are variable-sized, an index storing bucket counts or offsets is necessary, but that requires very little space relative to the compression gain. (The distribution of bucket sizes is well described by the Poisson approximation to the binomial distribution.)
Also check out the SparseFixedBitSet from the Lucene project: https://lucene.apache.org/core/7_2_1/core/org/apache/lucene/util/SparseFixedBitSet.html