, 2 min read
Column stores and row stores: should you care?
Most database users know row-oriented databases such as Oracle or MySQL. In such engines, the data is organized by rows. Database researcher and guru Michael Stonebraker has been advocating column-oriented databases. The idea is quite simple: by organizing the data into columns, we can compress it more efficiently (using simple ideas like run-length encoding). He even founded a company, Vertica, to sell this idea.
Daniel Tunkelang is back from SIGMOD: he reports that column-oriented databases have grabbed much mindshare. While I did not attend SIGMOD, I am not surprised. Daniel Abadi was awarded the 2008 SIGMOD Jim Gray Doctoral Dissertation Award for his excellent thesis on Column-Oriented Database Systems. Such great work supported by influential people such as Stonebraker is likely to get people talking.
But are column-oriented databases the next big thing? No.
- Column stores have been around for a long time in the form of bitmap and projection indexes. Conceptually, there is little difference. (See my own work on bitmap indexes.)
- While it is trivial to change or delete a row in a row-oriented database, it is harder in column-oriented databases. Hence, applications are limited to data warehousing.
- Column-oriented databases are faster for some applications. Sometimes faster by two orders of magnitude, especially on low selectivity queries. Yet, part of these gains are due to the recent evolution in our hardware. Hardware configurations where reading data sequentially is very cheap favor sequential organization of the data such as column stores. What might happen in the world of storage and microprocessors in the next ten years?
I believe Nicolas Bruno said it best in Teaching an Old Elephant New Tricks:
(…) some C-store proponents argue that C-stores are fundamentally different from traditional engines, and therefore their beneï¬ts cannot be incorporated into a relational engine short of a complete rewrite (…) we (…) show that many of the beneï¬ts of C-stores can indeed be simulated in traditional engines with no changes whatsoever. Finally, we predict that traditional relational engines will eventually leverage most of the beneï¬ts of C-stores natively, as is currently happening in other domains such as XML data.
That is not to say that you should avoid Vertica’s products or do research on column-oriented databases. However, do not bet your career on them. The hype will not last.
(For a contrarian point of view, read Adabi and Madden’s blog post on why column stores are fundamentally superior.)