, 1 min read
Which should you pick: a bitmap index or a B-tree?
Morteza Zaker sent me pointer to their work comparing bitmap indexes and B-trees in the Oracle database. They examine the folklore surrounding bitmap indexes—which are often thought to be mostly useful over low cardinality columns (columns having few distinct values, such as gender). Their conclusion is that the Bitmap index is the conclusive choice for data warehouse design for columns with high or low cardinality. Their claim is backed by experiments using columns with millions of distinct values. This confirms the observation made in my earlier post: The mythical bitmap index.
They make an interesting distinction between full cardinality columns, columns where each value appears only once, and high cardinality columns where at least a few values are often repeated (think about storing last names). A bitmap index is inadequate for a full cardinality column because there is no compression possible, and this is probably how the folklore around bitmap indexes came about. Yet, unlike transaction processing, data warehousing is usually not concerned with full cardinality columns.
Reference: Zaker et al., An Adequate Design for Large Data Warehouse Systems: Bitmap Index Versus B-Tree Index, IJCC 2 (2), 2008.
Further reading: Trading compression for speed with vectorization, To improve your indexes: sort your tables! and my research paper Sorting improves word-aligned bitmap indexes.