Daniel Lemire's blog

, 7 min read

Statistics is overrated: the rise of data science

9 thoughts on “Statistics is overrated: the rise of data science”

  1. James says:

    I would halfway disagree. Yes, some branches of classical statistics are ripe for takeover. That doesn’t mean that there’s not great theoretical work going on in statistics. What is the best prior to put on a complicated distribution? is an example of a question where stats remains relevant. However, from an applied perspective, I think that statistics without data analysis is at best marginally useful.

  2. Jouni says:

    There seem to be to largely separate fields called “statistics”. One is the classical statistics, which has its roots in the social sciences, and which is also widely used in medicine.

    Another, more computational field, has been prominent since at least the 90s. Some of its practicioners call themselves statisticians, while others prefer to be called mathematicians, computer scientists, physicists, or electrical engineers. Regardless of what they call themselves, everyone seems to be doing more or less the same thing. As far as I undertand, “data science” is supposed to be a new name for this field of computational statistics, though it sounds more like marketing BS in the same way as “big data”.

  3. Tom Dietterich says:

    With all due respect, you need to do some reading about inductive inference and the history of science. The goal of statistical inference is to draw conclusions about the real world from data, whereas in computer science, we are mostly just interested in exploiting statistical regularities to make predictions, filter spam, and so on. You had better hope that the people doing medical research are performing randomized trials and controlling for the type I errors of their causal inferences. Statistics did NOT start in the social sciences, although social science is where statistics are most commonly abused. Statistics started in physics and chemistry, but took its biggest leaps in agricultural research under R. A. Fisher. Recommended blogs: Andrew Gelman (http://andrewgelman.com/) and Deborah Mayo (http://errorstatistics.com/).

    1. I am long time follower of Andrew Gelman and he referenced my own blog on at least one occasion.

      Medical research is a mess specifically because of statistics.

      1. Alan says:

        Hi, Daniel! I’m interested in more of what you have to say regarding Statistics making medical research a mess. Can you please elaborate? I am intending on getting my Statistics degree in medical research– but am having doubts myself in terms of its efficacy. I would love to get your opinion.

        1. Please look up work by John Ioannidis starting by “Why Most Published Research Findings Are False”.

    2. Jouni says:

      Where do you think the word “statistics” comes from? Its literal meaning is something like “(the science) of the state”.

      Statistics started as the study of demographic and economic data, and later widened its scope to the gathering and analysis of data in general. Because of this legacy, some traditional universities still place statistics in the Faculty of Social Sciences.

  4. Visgean Skeloru says:

    Well your conclusion seems to very much depend where you draw the line between statistics and data science. At least for some it is the same thing.

  5. DataSage says:

    Stupid you are not. But ignorant and stupid, this blog post is. Arrogant and prideful, you are. You want to throw the baby out with the bath water. Computer scientist, I am, but sadly some computer scientists let it go to their head and start trashing other fields that they fully not understand. If people don’t know what a p-value is, that says more about them than about statistics. Common sense.