Daniel Lemire's blog

, 2 min read

Who is going to need a database engine in 2020?

Given the Big Data phenomenon, you might think that everyone is becoming a database engineer. Unfortunately, writing a database engine is hard:

  • Concurrency is difficult. Whenever a data structure is modified by different processes or threads, it can end up in an inconsistent state. Database engines cope with concurrency in different ways: e.g., through locking or multiversion concurrency control. While these techniques are well known, few programmers have had a chance to master them.
  • Persistence is also difficult. You must somehow keep the database on a slow disk, while keeping some of the data in RAM. At all times, the content of the disk should be consistent. Moreover, you must avoid data loss as much as possible.

So, developers almost never write their own custom engines. Some might say that it is an improvement over earlier times when developers absolutely had to craft everything by hand, down to the B-trees. The result was often expensive projects with buggy results.

However, consider that even a bare-metal language like C++ is getting support for concurrency and threads and esoteric features like regular expressions. Moreover, Oracle working hard at killing the Java Community Process will incite Java developers to migrate to better languages.

Meanwhile, in-memory databases are finally practical and inexpensive. Indeed, whereas a 16 GB in-memory database was insane ten years ago, you can order a desktop with 32 GB of RAM from Apple’s web site right now. Moreover, memory capacity grows exponentially: Apple will sell desktops with 1 TB of RAM in 2020. And researchers predict that non-volatile Resistive RAM (RRAM) may replace DRAM. Non-volatile internal memory would make persistence much easier.

But why would you ever want to write your own database engine?

  • For speed, some engines force you use nasty things like stored procedures. It is a drastically limited programming model.
  • The mismatch between how the programmer thinks and how the database engine works can lead to massive overhead. As crazy as it sounds, I can see a day when writing your engine will save time. Or, at least, save headaches.
  • Clever programmers can write much faster specialized engines.

Obviously, programmers will need help. They will need great librairies to help with data processing, data compression, and data architecture. Oh! And they will need powerful programming languages.