Daniel Lemire's blog

, 4 min read

How to win the Netflix $1,000,000 prize?

4 thoughts on “How to win the Netflix $1,000,000 prize?”

  1. Daniel Haran says:

    I subscribed to your blog after finding your work on CF for research on Netflix.

    A few things I’m curious about and would appreciate reading your thoughts on:

    -SVM’s barely get a mention by competitors. It seems odd that they wouldn’t be used. Do you know if performance is the issue? They are of course one of the worst for explainability.

    -Date seems to have a large effect on ratings, but all I find is people saying they can’t take advantage of it to improve their scores. Have you heard of any good research on this? With several percentage points variation over the year in average score, it boggles my mind nothing could be squeezed out of it.

  2. As for date… certainly, if you knew when the user entered his ratings, or at least when the user was active, you could leverage this information… but otherwise, it is hard to see where to go.

    There is published work on taking into account the time factor, mostly to dampen older ratings.

    I do not have the faintest idea how SVMs would fare on Netflix. However, scalability is definitively a serious issue for all algorithms. Most of academic machine learning work is done on relatively small data sets, and Netflix is huge in comparison, though it is still modest compared to what you face in industry.

  3. Yehuda Koren says:

    Two comments…

    About explain-ability:
    I am the last one to argue against the importance of explaining the recommendations to the user. (See my talk@Netflix…) However, the seemingly tradeoff between parsimonious explanation and accuracy is much exaggerated, especially when considering real life systems rather than idealist situations. Basically, you want to push as much as you can on both frontiers. Top-notch accuracy lets you stay with the more confident recommendations, and/or assume more risk, which allows you to dare and surprise the customer with not so popular recommendation specially tailored for him. Then, at a *second stage*, you may need to explain “why”. To this end, you are going to utilize the best explaining techniques in your disposal. And BTW, contrary to some belief, latent factor models, such as SVD, won’t prevent coming up with working explanations.

    About dates:
    We did utilize them. Description is scattered across our papers. Overall, they are far less helpful than what we hoped, especially considering the significant transitions over time that are present in the data. Maybe it won’t be true in other datasets.

    Regards,
    Yehuda

  4. Jan Christiansen says:

    In my view scale invariance is one of the indicators of a robust algorithm in all of numerical analysis. Any algorithm that can be defeated by a simple transformation is not correctly identifying the information in the data. It was Peter Deuflhard who first introduced me to this insight with a paper on affine invariance.