Daniel Lemire's blog

, 1 min read

Final Word on SIAM Data Mining 2007

So, the conference is over. For me, this was a pretty good experience: I was not sick, I met cool people, some folks appreciated my work, and so on. The conference was well organized: coffee was good, the hotel was well chosen, and so on. For people who know me, this is quite a review since I usually complain a lot about my trips.

However, I am a tad disappointed. Actually, I was disappointed the minute I looked at the list of accepted papers. Data Mining has lost its way.

What is Data Mining? It seems that people have totally forgotten what it is about. No, Data Mining is not Machine Learning though Machine Learning can be applied to Data Mining problems. Data Mining is primarily concerned with very large data sets. It is the essence of Data Mining. Any algorithm running in quadratic time with respect to the size of the data set is automatically out.

Data Mining is not only about prediction or classification. Data Mining is also about visualization, explanations, approximations, databases, Business Intelligence, and so on. It is about applying Map Reduce to large data sets. It is about scaling up to billions of data points. It is about dirty data.

Something is wrong about the review process: obviously, the program committee is overly focused on Machine Learning. I cannot complain because my paper was accepted, but, surely, a broader range of papers should have been accepted.