Daniel Lemire's blog

, 1 min read

For your in-memory databases, do you really need an index?

For large data sets on disk, indexes are often essential. However, if your data fits in RAM, indexes are often unnecessary. They may even be harmful.

Consider a table made of 10,000,000 rows and 10 columns. Using normalization, you can replace each value by a 32-bit integer for a total of 381 MB. How long does it take to scan it all?

  • When the data is on disk, it takes 0.5 s to scan the data. To maximize buffering, I have used a memory-mapped file.
  • When the data is in memory, it takes 0.06 s.

Can you bring the 0.06 s figure down with an index? Maybe. But consider that :

  • Indexes use memory, sometimes forcing you to store more data on disk.
  • Indexes typically slow down construction and updates.
  • Indexes typically only improve the performance of some queries. This is especially true with multidimensional data.

Verify my results: My Java code is available on paste.bin. I ran the tests on an iMac with an Intel Core i7 (2.8 GHz) processor.

Source: This blog post was motivated by a question from Julian Hyde, of Mondrian fame.

Further reading: Understanding what makes database indexes work

Code: Source code posted on my blog is available from a github repository.