Daniel Lemire's blog

, 3 min read

Netflix: an interesting Machine Learning game, but is it good science?

2 thoughts on “Netflix: an interesting Machine Learning game, but is it good science?”

  1. Jason Adams says:

    I tend to agree with the intuition that the systems being thrown at this are overfitting to the data set. The KorBell system is a hodgepodge of different methods that it seems unlikely would generalize to anything else without a lot of tweaking. I also agree that metrics like root mean squared error and mean absolute error have both reached the limit of their usefulness (there seems to be a collaborative filtering equivalent of a sound barrier). That said, I guess we can always hope the prize purse will bring someone to the field who makes a breakthrough.

  2. Yehuda Koren says:

    Daniel,

    You made a blatant statement: “do not think that the next step in collaborative filtering is to find ways to improve accuracy according to some metric. I think this game got old circa 2000”. My blatant response is that competitors revealed pretty soon that methods developed till circa 2006 got old, and cannot lead to significant improvements or further insight into the data. That’s why quite a few innovations were developed by competitors, thanks to the Netflix challenge, during the past year.

    It will take some time to fully recognize and appreciate these innovations. Certainly better familiarity with the methods themselves is required. The chosen RMSE error measure (which is an excellent choice in my eyes, but that’s another topic) has the tendency to miniaturize impression of progress, due to that square root…
    However, a deeper look into the new developments would reveal some important contributions to the field: (1) Improvement in accuracy will definitely have an impact on user experience. E.g., our studies show that 8% drop in RMSE means a very significant improvement in the quality of the top-K recommendations. (2) Key innovations are not specific to the contest, but general, and can be leveraged by a company like Netflix to obtain further improvements based on integrating the extra information that they hold. (3) Almost all new methods are scalable and computationally efficient (what is implied by the size of the Netflix dataset, which is much larger than previous ones.)

    I sympathize your willing to think bigger, beyond improving prediction error, but we should never forget the basics and the important impact they have on recommenders’ quality.

    Best wishes for the new year,
    Yehuda