6th March 2012, 12 min read

How fast is bit packing?

18 thoughts on “How fast is bit packing?”

John Regehr says:

March 6, 2012 at 10:40 pm

I’m missing something… how is packing 32-bit integers into 17 bits a savings of 90%? It sounds closer to 50%.
Daniel Lemire says:

March 6, 2012 at 10:46 pm

@John

Well. I have that 32/17 – 1 is 90%. But I grant you that it is less confusing to say 50%, so I have updated my blog post accordingly.
Jay Stein says:

March 7, 2012 at 10:25 am

Please see my US patent no. 5,602,550, filed in 1995, granted in 1997, which describes a complete implementation of an adaptive compression utilizing bit packing, but also allowing for bit packing of deltas between successive values. This algorithm was built for speed.
Patrick Stein says:

March 7, 2012 at 11:05 am

I’m missing something here. I was hoping to see a speed comparison between bit-packing and not bit-packing.

Given an array of k-bit integers stored in 32-bit integers, how long does it take to copy that array? how long does it take to pack that array? how long does it take to unpack the packed array?
Daniel Lemire says:

March 7, 2012 at 11:16 am

@Patrick

I’m missing something here. I was hoping to see a speed comparison between bit-packing and not bit-packing.

You get the non-packed approach when bit is set to 32.

Don’t forget that my source code is available (see link) so you can run your own tests if you want!
Patrick Stein says:

March 7, 2012 at 11:45 am

Indeed. You even mention that in a part I skimmed through before. Thank you.
Marsh Ray says:

March 7, 2012 at 1:02 pm

It would be relevant to know how many numbers are in the data set being packed or unpacked, and compare that to no packing at all. Cache effects are likely to dominate above various sizes.

@Jay Stein – The only proper response to that is: (rude language censored by D. Lemire) go crawl back under the rock you came from software patenter.
zav says:

March 7, 2012 at 10:56 pm

The first word of your article is spelled wrong.

That’s when I stop reading.
David says:

March 8, 2012 at 12:06 am

What a shame, zav. Most compilers are sophisticated enough to continue parsing even in the presence of syntax errors.

P.S. I think you meant “stopped,” not “stop.”
Daniel Lemire says:

March 8, 2012 at 9:27 am

@zav

I fixed the typo. Thanks for reporting it.
zav says:

March 8, 2012 at 9:49 am

Thanks Dan. I’m sure I’ll love your article. Will check it out later on today.

Cheers.
Jay Stein says:

March 8, 2012 at 12:22 pm

@Marsh Ray – My compression algorithm was patented by the company where I was employed at the time. I did not think it was worth wasting anyone’s time explaining that detail. The patent application is a publicly available explanation of the algorithm, which is relevant to the current discussion, unlike your trolling.
zav says:

March 9, 2012 at 9:41 am

Jay, would love to check out your patent. I’ve been fascinated with the potential for this since 1995 while investigating systems and methods for storing quantized delta frames in video streams. None of my PAs are as fundamental.

David, this is nice. Wish I had time to play with this at the moment. Thanks for the source and the correction. Cheers.
Michele Filannino says:

March 9, 2012 at 10:51 am

Hi Daniel,

this is my graph:
http://dl.dropbox.com/u/265383/bit_packing.png

It seems the opposite of that one showed in the post. What do you think?

Bye,
michele.
Daniel Lemire says:

March 9, 2012 at 11:03 am

@michele

Interesting. Can you give me some details, like processor type, compiler and so on?
Itman says:

March 12, 2012 at 8:55 am

Michele,

It is not quite the opposite. The trend is the same:
1) There is very little difference between unpacked and packed readings
2) Some packed reads are more (though only slightly) efficient than unpacked ones.
Daniel Lemire says:

March 12, 2012 at 9:06 am

@itman @michele

If you look closely at my code, you’ll notice that I use a lot of loops that can and should probably be unrolled. I actually leave them rolled when it makes sense so that the compiler has more options (compilers don’t typically “roll back” loops that were manually unrolled).

Anyhow. I adjusted the code until it looked like I got optimal results with GCC 4.6 and my particular hardware. Because Michele is using GCC 4.2, I am not surprised that the results differ.

However, even with GCC 4.2, it might be possible tweak the results with the proper optimization flags.

As you say @itman, the results are not really all that different. But it is nice to see independent tests.
Frederico Schardong says:

March 16, 2012 at 11:25 am

Very nice post!

I’m implementing the same idea, in C and in fewer lines. The code is here: pastebin.com/SfEkqKnv

Please take a look and if you want to help me I’ll appreciate that. 🙂

I’m having errors.. eg packing at 32 int variable numbers less than 17 are fine but when its greater than 16 it doesn’t work well… I don’t know what’s the problem, will appreciate any help.

frede dot sch at gmail dot com