Daniel Lemire's blog

, 1 min read

Collaborative Filtering: Why working on static data sets is not enough

As a scientist, it is important to question your assumptions. So far, most of the hard Computer Science research on collaborative filtering has used static data sets such as Netflix. Specifically, it is assumed that the recommender systems do not impact the ratings and what items get rated. A related assumption is that polls do not change how people vote (thanks to Peter for this observation).

Yet, people’s preferences are often constructed in the process of elicitation. That is, collaborative filtering is a nonlinear problem: ratings feed into the recommender system which helps to determine what people rate, which, in turn, feeds back into the recommender system…

How could a researcher take this into account? It would be too expensive to try to simulate e-commerce sites with volunteers. We need to submit simulated users to a recommender system. The usefulness of the recommendations is a tricky thing to measure and cross-validation errors are probably not what you want to study exclusively, diversity might be an important factor too.

Note 1: If someone out there know how to simulate users (something I do not know how to do), please get in touch! I have no idea how to do sane user modelling and I need help!

Note 2: Peter also once pointed me to the Iterated Prisoner’s Dilemma problem as something related.