Daniel Lemire's blog

, 6 min read

After Netflix? What next?

7 thoughts on “After Netflix? What next?”

  1. Hi all
    What about research on the upper performance limit of RC systems? If there is a good model, telling you the performance expectations for a given setup (‘data topology’) according certain metrics and methods,we can slow down with the trial and error procedures. I think it would give us new inputs and insights..
    Cheers
    Marcel

  2. Sylvie Noel says:

    Testing user satisfaction should be fairly straightforward. I’d use a before/after method, with same people in the two conditions. You can either directly ask people how good the recommendations are, or you can do a behavioural approach, where you ask people to click on the items that interest them and see if there are more items in the after than in the before.

  3. @blattner

    From a Machine Learning perspective, it is likely to be extremely difficult to compute such a bound on the accuracy. Unless you specify what type of algorithm you are allowed to use or can make some assumption about the data. Indeed, it is always possible that there is some unknown exotic structure within the data that can be exploited. How do you prove that there is no such structure? Hard.

    From a practical user perspective, we can obviously bound the accuracy by how well human beings can guess their own ratings. Frankly, when I got back to my past ratings, I am sometimes surprised by how highly or lowly I rated certain items. However, this “accuracy” will depend on the user and its context. For example, maybe one person always give a rating of 3 to all items, no matter what. In this case, clearly, it is easy to make perfect predictions. Other users will be more frivolous, changing their minds from day to day. So, it is unlikely that there is some universal constant regarding the unaccuracy out there.

  4. @lemire
    I agree on your thoughts about “self-correlation” (see also http://www.apparentwind.com/navigation/videos.html section Reliability).
    However, as a physicist I believe in simple, but controllable models. I think on can do pretty much :-). One big class of algorithms use an overlap based approach (common rated items between users, or common audiance shared by two objects in question). For such a class we could do some assumptions: the way people rate objects depends obviously on what prior information they have about objects they rate. Let’s take movies: everybody does a pre-selection and is influenced by many sources. So the probability density over the rating space is clearly influenced by that fact. And I would expect a right shifted (gaussian?) distribution over the rating space. Furthermore we could setup different ways (distributions) what movie will be rated by a user. From these simple facts only, we could do a small model and compute the expected error (i.e. RMS) for different levels of correlation.
    Now take something like jokes. You don’t have any prior information when somebody is telling you a joke. So I would expect a much broader distribution over the rating space. And indeed, when I compare movielens and jester distributions, they differ in that manner. I don’t think we could built a model telling us the whole story about every RC system. But I think we could do a good one, telling if a certain method makes sense in a particular situation (data).
    cheers
    Marcel

  5. @blattner

    Interesting take on the subject.

  6. Mr. Gunn says:

    I haven’t been following this real closely, but are any of the winning algorithms actually cheap enough for Netflix to use in production?

  7. @Gunn

    This is a valid question. Netflix will probably not put these algo. in practice “as-is” due to scalability and business reasons.