Daniel Lemire's blog

, 5 min read

Is MapReduce obsolete?

7 thoughts on “Is MapReduce obsolete?”

  1. Seb says:

    It does seem like we need methods that refine incrementally.

    With the effervescence of activity in social media (many true experts now getting into that game) and increasingly rapid creation of new knowledge, models, and frameworks, knowledge is becoming obsolete faster and it seems like it is more and important to know about activity that occurs in the present, even as it builds on the past.

  2. Seb says:

    (My argument is that there is a necessary coevolution between search and the kind of fine-grained open collaboration that is now emerging on the Web.)

  3. Greg Linden says:

    It’s true you don’t want to use a system designed for large scale batch processing for tasks that aren’t large scale batch processing.

    For example, there was an article a while back talking about how GMail ran into problems because the data storage was ultimately layered on top of GFS, which isn’t designed for random access workloads:


    That being said, I think the Register article is badly overstated. Incremental index updates are run out of Bigtable, but full index rebuilds are probably still run out of MapReduce/GFS. Moreover, Bigtable itself is layered on top of GFS.

  4. Itman says:

    1. Stonebraker IMHO create a huge mess. Because map-reduce is nowhere close to a database system. All this comparison does not make much sense.

    2. Most data is still static. Dynamic data, of course, needs special treatment.

  5. kristina says:

    Google insiders’ reactions to that article was “What? That’s a weird reading. Buh? Lipkovitz was misquoted!” and “what a crappy article.”

    The article is, apparently, very misleading and, in places, downright wrong. Google built something cool and new, but in no way are they moving away from MapReduce.

    And, on a personal note, Stonebraker seems like an ass.

  6. Rome says:

    I know exactly what happens but can’t say because of NDA, the article is not far away from truth.

  7. arkady says:

    (a) MapReduce is a wonderful tool for a large open class of problems.
    (b) from the very start, Google had other ways to run distributed processing atop GFS (not just MapReduce)
    (c) if I believed in conspiracy theories, I’d state that the original Google’s paper about MapReduce was a smart decoy to create a confusion among “fast followers” and send them on a wrong trail by downplaying the importance of GFS (compared to MapReduce).
    One of the reasons of Hadoop’s success is that (unlike other similar attempt) it focused on its file system from the very start.