17th February 2012, 10 min read

Bitmaps are surprisingly efficient

12 thoughts on “Bitmaps are surprisingly efficient”

Anton says:

February 18, 2012 at 1:43 am

May be, instead of branching, the following formula will be more efficient:

newVal = oldVal * (1-maskBit) + updatedVal*maskBit

?
Luc says:

February 18, 2012 at 5:15 am

Actually, it’s not bitmap that are fast. It’s the algorithm: it seems it’s faster to update while copying instead of doing it afterwards, random accessing again the (big) data. Locality of reference does matter.

If exceptionpos is sorted, i believe it would be even faster to compare to *pos. You would then save all these bit operations for a simple integer comparison. This way, bitmap is just another implementation for looking up inside the copy loop if the value is to be changed.
Daniel Lemire says:

February 20, 2012 at 8:55 am

@Anton

This is clever, but it won’t quite work in the context I have described because we want to increment the pointer to the updated value if and only if we use it.

What you describe would work if we had two same-length vectors. (And yes, it could be slightly faster to replace branching by multiplications. Branching is really bad on many processors.)
Daniel Lemire says:

February 20, 2012 at 9:28 am

@Luc

I do invite you to provide a faster alternative. I find it difficult to beat the bitmap over the whole range of patching densities. There is always some overhead.
zokier says:

February 23, 2012 at 9:50 am

Have you considered std::vector instead of raw bitmap?
Daniel Lemire says:

February 23, 2012 at 9:59 am

@zokier

You mean std::vector? The underlying implementation is probably close to what I offer here.
Martin Trenkmann says:

February 26, 2012 at 5:36 pm

Hi Daniel.

Interesting problem. I tried hard to beat the bitmap solution, but finally I had to surrender.

However, I found some errors in your original code: (Note by D. Lemire: these errors were corrected and my benchmark updated)
1. In line 163, where you set the bits of your bitmap, the AND operation is not correct, it has to be an OR operation. Thus in your bitmapcopy functions there is no update operation at all.
2. Also in both bitmapcopy functions there is an off-by-one error with the index i.

I corrected the errors, made my own copy versions, and ran some performance tests. You find the updated version of the code under http://pastebin.com/qQy88c8V
(Please click the RAW button at the top to see the unwrapped result tables at the end of the listing. The columns contain pairs of milliseconds and number of update operations)

Furthermore, I think the overall coding style is not the best and very error-prone. In C++ one should prefer iterators over raw pointers. I personally decided to use operator[] for accessing the vectors, because your problem setup uses index values. I tested a lot and I found no performance differences in using pointers, iterators or operator[]. Indeed, for the compiler all these operations are equivalent.
A second issue are the underscored qualifiers __attribute__ and __restrict__ in your code. These are compiler extension and shall be ommitted. The compiler knows best how to optimize. See also N3242 Â§17.2(2) of the C++ ISO standard.

OK, let’s come to my implementations:

memcopy: Nothing to say here. Copying a chunk of memory without any modifications is faster than copying element-wise using a loop.

updatingcopy: The simplest solution for the given problem with just clean syntax. It outperforms the patchedcopy versions and do not use raw pointer arithmetics.

updatingcopy2: Here I tried to group an index value together with its update value in a pair to take advantage of locality. However, the effect is negligible.

updatingcopy3: Here I tried to avoid branching and made the iteration over the vector with the (index, update value)-pairs an outer loop. I suppose this is the same approach as in patchedcopy2, except that my version is human-readable.

Conclusion: The bitmapcopy versions are not to beat for high densities only. At the other hand the resulting code is more error-prone and hard to read. I showed that just clean and simple code gives comparable performance results, at least for low densities. When trying to optimize do not hack your compiler, especially for C++ use its type system as much as possible and avoid ugly C-style casts. What helps is locality of data to minimize cache misses. Bitmaps are an good example for this, definition of variables as close as possible to their usage is another one. Finally, maintainability of code really matters.

Have fun!
Daniel Lemire says:

February 27, 2012 at 2:28 pm

@Martin

Thanks Martin! I really appreciate the bug report and I fixed my code.

Note that memcpy is *not* more efficient that loop copy using my compiler and my CPU.

You are using an older GCC (4.4). Have you tried it with a recent GCC (4.6.2)? In my tests, this makes a huge difference.

You are also using an inferior processor. My i7 has probably better pipelining than your Core 2.

Anyhow, here are my results with your code:

g++-4 -Ofast -o bitmaptrenkmann bitmaptrenkmann.cpp density(%) memcopy loopcopy bitmapcopy bitmapcopy8bits patchedcopy patchedcopy2 updatingcopy updatingcopy2 updatingcopy3 100 25 25 43 0 26 0 65 32000000 59 32000000 49 32000000 42 32000000 48 32000000 16.6667 25 25 50 0 25 0 47 5333334 50 5333334 45 5333334 37 5333334 39 5333334 9.09091 29 30 48 0 26 0 43 2909091 40 2909091 38 2909091 31 2909091 30 2909091 6.25 25 25 47 0 26 0 41 2000000 40 2000000 40 2000000 31 2000000 30 2000000 4.7619 25 24 46 0 26 0 39 1523810 40 1523810 39 1523810 29 1523810 29 1523810 3.84615 26 25 44 0 27 0 39 1230770 37 1230770 38 1230770 28 1230770 28 1230770 3.22581 27 25 47 0 25 0 37 1032259 40 1032259 36 1032259 29 1032259 28 1032259 2.77778 26 25 43 0 26 0 36 888889 42 888889 40 888889 28 888889 34 888889 2.43902 27 25 46 0 25 0 35 780488 44 780488 42 780488 32 780488 34 780488 2.17391 25 24 46 0 26 0 34 695653 44 695653 40 695653 32 695653 33 695653

Summary: Basically, using your code I get the same performance and thus the same conclusion. I agree that using STL is nicer and better (I’m a big fan of STL) than using pointer arithmetic, especially when you get the same performance!
Itman says:

February 27, 2012 at 5:55 pm

Haaaah, cool!

We are actually using (sparse)-bitmaps in our search engine for boolean queries. They work pretty well.
Daniel Lemire says:

February 27, 2012 at 6:20 pm

@Itman

Doing the same type of micro-benchmarks with sparse bitmaps could be interesting.
Thomas says:

April 7, 2012 at 6:50 am

are you sure you have a macbook air with an i7? did apple give you a not-yet-to-the-public-released mba? 😉
Daniel Lemire says:

April 7, 2012 at 4:33 pm

@Thomas

Absolutely. MacBook Airs come with a core i5 or core i7 intel processor:

http://store.apple.com/ca/configure/MC966LL/A

In fact, it is not possible (at least in Canada) to buy them with anything less than a core i5 processor.