Daniel Lemire's blog

, 1 min read

Compression is never worth slowing down your code?

Should you ever compress your data to reduce memory usage?

We are in 2050. Nobody uses tapes or disks anymore. The NSA has 1 billion servers. It needs so many to hold all of the data in memory. A contractor offers a marvellous algorithm that can compress data by a factor of 1,000,000,000. The problem is that it decreases all processing by a factor of two.

Does that sound like a good deal?

Let us put it in practice. The president wants the NSA to take all data and change lower-case letters by upper-case letters, and vice versa.

Without fancy compression, it would take all of the 1 billion servers one hour to complete the task.

With the new compression technology, you can use just one server to hold all of the data and do the processing. Unfortunately, your throughput is half, and you have just one machine, so it would take you 2 billion hours. To complete the task in 1 hour, you would need 2 billion servers. That’s twice as many as you have.

Thus the NSA rejects the contractor offer. He goes on to sell the technology to North Korean where it is used for evil purposes we shall discuss another day.

Where did I go wrong in my story?

Credit: This fun puzzle was given to me in a different form by Leonid Boytsov.