, 6 min read
Collaborative Filtering: Why working on static data sets is not enough
6 thoughts on “Collaborative Filtering: Why working on static data sets is not enough”
, 6 min read
6 thoughts on “Collaborative Filtering: Why working on static data sets is not enough”
Hi Daniel,
This related work comes to mind:
http://tinyurl.com/2uolhv
Intelligent Information Access Publications
– A Learning Agent that Assists the Browsing of Software Libraries
– A Learning Apprentice For Browsing
– Accelerating Browsing by Automatically Inferring a User’s Search Goal
I agree that this is a very important complication in the evaluation of collaborative filtering.
To sharpen the point, I think that there are two separate issues:
(1)
The fact that the interactive recommender system influences the users’ behaviors, which, in turn, feedback into the CF system, and so in a loop. In other words, the CF mechanism is a active part of the system that it is supposed to learn and judge.
(2)
All the feedback to the collaborative filtering is conditioned on the fact that the users actually performed an action. All our observations on a product are based on the very narrow and unrepresentative sub-population that chose to reflect their opinion (implicitly or explicitly) on that product. Naturally, such a population is highly biased to like the product. For example, when we say that “the average rating for The Six Sense movie is 4.5 stars” we really mean to say: “the average rating for The Six Sense movie AMONG PEOPLE THAT CHOSE TO RATE THAT MOVIE is 4.5 stars”. Now what is really the average rating for The Six Sense across all population? Well, that’s hard to know. But the whole population is the one that really counts…
I used to be much more concerned about the second issue…
Yehuda
The key challenge seems to be: how do we study (with rigor) these problems?
[Caveat: I’m a programmer, not a researcher]
The production recommendation systems that i’ve had experience with attempt to avoid self-reinforcing behavior by introducing a degree of randomness. In other words, you determine recommendations based on the user’s rating profile, but then you augment that with some percentage of more remotely related items and possibly even a small percentage of unrelated items. I wish i could provide evidence that this helps, but it’s mostly a hack.
There are a couple of other biases in rating data though, at least in the area that i’m familiar with (music). One is the “selection bias”, or the fact that people don’t rate everything that’s presented to them but rather only things they love or hate. The other is that peoples’ rating behavior can differ substantially from their actual listening behavior (probably more when their rating profile is public).
It might be possible to model users in the sense of reproducing the distribution of ratings in a dataset like NetFlix’s. But i think the bigger challenge for recommendation technology right now is to capture the things we aren’t getting from users, like how to correlate mood to preferences, or how to distinguish true favorites from temporary enthusiasms.
Thanks for the comment. Yes, it would interesting. We need people to do this. (I can’t — at least not alone.)
I’ve always figured sites powering recommendation systems would need to perform some sort of experimentation on their users to control for the effect of recommendations. This could include selectively omitting recommendations (perhaps altogether for certain items and/or users) to establish control groups.
Regarding Note 1, I think a simulation of human behaviour adequate to explore the consequences of ratings on human behaviour would require already knowing the answer, so that’s a circular and prohibitive way of going about things.