Daniel Lemire's blog

The Internet is a product of the post-industrial age

, 2 min read

The Internet is on fire with this question: who invented the Internet? A couple of weeks ago, the president of the USA said: Government research created the Internet so that all the companies could make money off the Internet. Crovitz replied in the Wall Street Journal: It was at the Xerox PARC…

Is C++ worth it?

, 2 min read

We routinely attribute the long battery life and power of our tablets and tiny laptops to better hardware. However, in many cases, this better hardware runs software that is an order of magnitude faster than older software. For example, our web browsers feel faster because JavaScript interpreters…

Why we make up jobs out of thin air

, 5 min read

We prefer to invent new jobs rather than trying harder and inventing a new system that wouldn’t require everybody to have a job.” (Philippe Beaudoin) In the XXIst century, people from wealthy countries work hard primarily to gain social status. We often make the mistake of tying up wealth…

Bytes or octets?

, 1 min read

Quick: what is the definition of a byte (as in two kilobytes)? If you said it is a unit of 8 bits, you failed. Correct answer (according to IEEE 1541): A byte is a set of adjacent bits operated on as a group; The octet is a set of 8 bits. Hence, if I refer to 1024 times 8 bits, I should avoid…

Which is fastest: read, fread, ifstream or mmap?

, 2 min read

If you program in C/C++, you have many options to read files: The standard C library offers a low-level read function. It is as simple as it gets.- The standard C library also offers a higher level fread function. Unlike the read function, you can set a buffer size. Buffers can be good or bad. On…

Do not waste time with STL vectors

, 3 min read

I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of…

On the quality of academic software

, 4 min read

Software is eating the world. Despite a poor year, Facebook has a market capitalization of $65 billion. This little company with barely 2000 developers is worth as much as a car marker. Students should take notice. I would expect countless students to come to college demanding top-notch software…

Data alignment for speed: myth or reality?

, 3 min read

Some compilers align data structures so that if you read an object using 4 bytes, its memory address is divisible by 4. There are two reasons for data alignment: Some processors require data alignment. For example, the ARM processor in your 2005-era phone might crash if you try to access unaligned…

Creating incentives for better science

, 4 min read

Popper argued that science should be falsifiable. To determine truth, we simply try to disprove an hypothesis until we are exhausted. It is a nice theory, but actual science does not follow this process. I read many funding proposals, and I have yet to read one that says: "This other guy came…

Summer reading recommendations

, 4 min read

What came after by Sam Winston is an intriguing scifi novel. It describes a near-future dystopia where a handful of large corporations have taken over the USA. After being a puppet to powerful interests, the government has finally been abolished. In some sense, it is the anti-libertarian novel:…

Punk money: how you can print your own currency… legally

, 4 min read

We all want and need money. However, for many services, paying actual dollars is inefficient. The transaction costs are too high. So we need a system whereas perfect strangers can make deals at a very small transaction cost. For this purpose, people use punk money: You publicly promise a favor in…

Computer scientists need to learn about significant digits

, 1 min read

I probably spend too much time reviewing research papers. It makes me cranky. Nevertheless, one thing that has become absolutely clear to me is that computer scientists do not know about significant digits. When you write that the test took 304.03 s, you are telling me that the 0.03 s is somehow…

Let us abolish page limits in scientific publications

, 4 min read

As scientists, we are often subjected to strict page limits. These limits made sense when articles were printed on expensive paper. They are now obsolete. But we still need to print the articles on paper! At least in Computer Science, almost everyone has adopted electronic media. It is cheaper and…

How to manipulate the masses by language alone

, 3 min read

George Orwell with novel 1984 popularized the idea that by changing the language, you could change the minds. It is easy to forget that we are routinely victims of this strategy. A fascinating example is the French language itself. I long had this image of the French revolution as the French…

Bit packing is fast, but integer logarithm is slow

, 2 min read

In How fast is bit packing?, we saw how to store non-negative integers smaller than 2N using N bits per integer by a technique called bit packing. A careful C++ bit packing implementation is fast: e.g., over 1 billion integers per second. However, before you pack the integers, you might need to…

It is what you do, not what you own

, 1 min read

Over 20 years ago, back when I was in high school, I went on a sailboat trip. I was so impressed that I decided to own a sailboat one day. I realized that a sailboat was expensive, and I guess I thought that owning a boat would not only be cool, it would be a symbol of my success. (Can you…

Publicly available large data sets for database research

, 6 min read

Most database research papers use synthetic data sets. That is, they use random-number generators to create their data on the fly. A popular generator is dbgen from the Transaction Processing Performance Council (TPC). Why is that a problem? We end up working with simplistic models. If we consider…

Do we need copyright?

, 7 min read

The concept of property is a social construction. Animals, such as cats, can own a piece of food, or a territory, but only as long as they are able to personally maintain a credible threat of violence. And animals can only defend concrete, physical properties, such as an area, a dead bird or a…

From counting citations to measuring usage (help needed!)

, 3 min read

We sometimes measure the caliber of a researcher by how many research papers he wrote. This is silly. While there is some correlation between quantity and quality — people like Einstein tend to publish a lot — it can be gamed easily. Moreover, several major researchers have published relatively…

How fast is bit packing?

, 3 min read

Integer values are typically stored using 32 bits. Yet if you are given an array of integers between 0 and 131 072, you could store these numbers using as little as 17 bits each—a net saving of almost 50%. Programmers nearly never store integers in this manner despite the obvious compression…