4th September 2009, 2 min read

Changing your perspective: horizontal, vertical and hybrid data models

Data has natural layouts:

text is written from the first to the last word,
database tables are written one row at a time,
Google presents results one document at a time,
the early recommender systems compared users to other users,
discussions are organized in newsgroups and posting boards by topic,
research papers are organized in journals and conferences,
objects have attributes (a ball is red), and from these attributes we determine similarities between objects.

Using a database terminology, these are horizontal layouts.

We can rotate these models to create vertical layouts:

Instead of writing text sequentially, we can store the locations of each word in an inverted index.
Instead of writing database tables row-by-row (e.g., Oracle, MySQL), we can write databases column by column (e.g., C-Store/Vertica, LucidDB, Sybase IQ, and my Lemur Bitmap Index Library).
Instead of presenting results sets one document at a time, we present tag clouds and use faceted search to support exploration. Thus, instead of listing documents, we focus on attributes (date, topic, author).
Recommender systems are often more scalable when they compare items instead of users: the most famous example is Greg Linden‘s Amazon recommender (if you liked this book, you may like…). For example, the Slope One algorithms outperform many user-to-user algorithms.
The social web started out with topic-oriented newsgroups and posting boards, but it is not dominated by user-centric blogs and social sites (such as Facebook or Twitter). Since then, we have realized that user-oriented blogs can be preferable.
While research papers are published in conferences and journals, I argue that we should turn this around and organize research papers by author through author-specific feeds.
Some AI researchers are suggesting that relations might be primary whereas attributes would be secondary.

Many of the best solutions are hybrids. For example, text search sometimes require full-text indexes such as suffix arrays and Oracle recently announced a row/column hybrid.

Take away message__:__ If you are stuck, try to rotate your data model. If neither the vertical nor the horizontal model is a good fit, create an hybrid.