Daniel Lemire's blog

, 14 min read

Deep learning: the silver bullet?

13 thoughts on “Deep learning: the silver bullet?”

  1. Ivan Shekerev says:

    It seems to me that extinction events are necessary.

    They clear away the obsolete and are a source of renewal.

    Like spring.

  2. I think that there is at least one example of connectionist models solving general intelligence: humans!

    1. Yes, I do think that humans are a form of quite “general” intelligence. We can solve general tasks, we are useful (I am not sure that this is up for debate), and IA at human level would be too.

    2. Yes, “Deep Learning” “Brain”, but I could imagine that we can go from Deep Learning to brain style performance without passing through a disruptive change in science. At some point “System 2 style” reasoning will be necessary for general IA, but, again, humans can do that with connectionist models.

    1. I do think that humans are a form of quite “general” intelligence.

      It is already the case that cheap computers using little power can solve problems that no human being could ever solve in a reasonable amount of time. So it seems that there is no question about the lack of generality of our brains. We can’t come close with our own computers in basic cognitive tasks.

  3. I totally agree with your words: “machine learning is a vast field with many classes of techniques”. That’s why I welcome and follow developments like IBM’s Watson, which continue in the tradition of the “symbolic school”, with algorithms that are less “black box” than neural networks.

  4. Dear Daniel, I apologize in advance for taking the floor but deep learning and AI are topics for which I feel very passionate, and I work hard to keep up.

    I agree there is a lot of hype in deep learning (DL), but that’s in large part justified by the awesome results got in many difficult problems which were considered until recently out of reach for artificial intelligence.

    Just last year, AlphaGo beating the Go grandmaster Lee Sedol and the Google Neural Machine Translation bridging the gap with human translators fulfilled two old quests of the Grail which date back to the early days of AI. These were historic moments.

    That said, in machine learning there is a well known theorem called « no free lunch » which can be freely interpreted as the lack of a priori distinctions between learning algorithms. This means that there in no «silver bullet nor «magic pill» in machine learning. It always depends on the datasets and the intended target.

    Deep learning is powerful and is now the dominant solution for computer vision, speech recognition and machine translation. But there is a price to be paid… as you rightly wrote, that’s the huge quantity of data needed (Google-sized) to train any reasonable deep network and the computing power in keeping with it.

    Moreover, neural networks are black-boxes and human interpretability is one of the biggest challenge of deep neural networks.

    That’s leaving plenty of room for more classical machine learning algorithms like XGboost, SVM, KNN, linear models and other random forests…

    In conclusion, research proceeds in waves, now it’s deep learning wave. Hype is not always a bad thing since it motivates people and allows an innovation to be pushed to its limits.

    My opinion, deep learning is a real breakthrough and moves us closer to strong AI but I would be very surprised that deep learning is the end of the road.

    1. My opinion, deep learning is a real breakthrough and moves us closer to strong AI but I would be very surprised that deep learning is the end of the road.

      I agree with that.

  5. Mahadevan Iyer says:

    Thanks for an informative interesting post.

    However some advice to students, learners, and industry professionals excited about blindly applying Deep Neural Networks to everything:

    1. Beware of over-fitting and bringing unnecessary complexity to what may turn out to be a problem with a simple structure.
    Remember that so far the killer apps for DNNs have been problems with inherent complexity like understanding hierarchically structured data produced by nature e.g. speech, vision, etc. If you are analysing business metrics for example, DNNs may be overkill and may even give poor generalization in the field if not designed well.

    2. Use a multiple learner approach where you try different activation functions and combine or select from their outputs in some manner. Just like is done in the bagging or boosting approaches in decision tree methods.
    I will give you a simple example of using wrong actication function that is quite realistic for many predominantly linear problems:
    Consider a regression problem where you know that its structure is predominantly linear. There is one input x and one output y which you know can be modeled well as
    y = mx + e where e is minor noise to be neglected for practical accuracies. You *know* this beforehand. What you don’t know is the value of m. Then the problem is simply to find m. In this case the solution is trivial:
    – Take any one sample point (x,y) and find m as y/x.
    However consider what happens when you don’t apply your domain knowledge of this problem and blindly apply a sigmoidal neural network to it just because everyone else is doing it. Then say you will start with a single layer perceptron i.e. y’ = sigmoid(ax). Why are you choosing a sigmoid activation function here? Because the Universal Approximation Theorem told everyone that 3-layer network of sigmoids can approximate anything to any acuuracy.
    But for our simple linear problem you will then find that no matter what value of m you converge to in your training, you will never achieve the correct value of m = y/x! This simply because mathematically the function sigmoid(mx) is close to mx only for sufficiently small x.
    Even if you keep madly deepening the network with one sigmoid layer after another to try improve your accuracy you will still find that it works as well as the linear function only for small x.
    The lesson here is your domain knowledge or intuition of the problem to choose the right activation function or model.

    3. Consider always the tradeoff between accuracy and cost of implementation and operation, both for the training as well as the inference phases.

    1. All right! In other words, don’t use a sledgehammer to kill a fly.

      Because it’s trendy, deep learning is used for everything and nothing carelessly.

      Along the same lines, I would like to share with you an hilarious blog post by Joel Grus about a recruiter completely lost facing an obsessive TensorFlow developer… https://goo.gl/2oD6xS

      1. Mahadevan Iyer says:

        > All right! In other words, don’t use a sledgehammer to kill a fly.

        Ha. Well said. Actually in my example, the sledgehammer doesn’t even kill the fly while the rolled up newspaper does. What the sigmoid does is add more nonlinearity and worsens the mean square error!

        Sometimes though, sigmoid NNs can reasonably approximate Volterra models as I found out in my 1993 Masters Thesis in DSP on Adaptive Nonlinear Echo Cancellation using feedforward nets. The MSE with a single layer NN was worse than the 2nd order Volterra model for the echo canceler but good enough for practical accuracies i.e. echo amplitudes.

        The advantage of using sigmoid NN here is that it can be super-efficiently implemented as a single transistor in analog VLSI.

        Volterra filters are a linear combo plus additional nonlinear terms that are higher powers and and cross products of the inputs.

        I read Joel Grus’ post and it is hilarious! We can make a comedy skit out of this or something..

  6. Mark Van Peteghem says:

    A few years ago I followed the machine learning course by Andrew Ng. He started with linear regression, a mathematical technique that is over a hundred years old! And I think he was right in doing so, why use a neural network for cases where linear regression suffices? Then he explained logistic regression, a simplified neural network for classification, with the advantage over general neural networks that it is guaranteed to converge to the optimal solution.
    More recently I’ve been experimenting with the Spark library. I found decision trees a great technique, because it is quite simple, and has as output an if-else tree that is easy to understand, like this:
    If (feature 434 0.0)
    Predict: 0.0
    Looking at all the weights in a neural network will make you none the wiser.

    1. Quite true! Linear regression is reliable, robust and easy to interpret. Furthermore, Linear regression is often used to introduce gradient descent which is THE optimization method for deep neural networks.

      Decision trees and particularly ensemble of trees (XGBoost, Random forest, Gradient Tree Boosting) are very powerful learning algorithms for a large range of applications and often usable directly out-of-the-box. And trees are easy to interpret!

      In fact, real Deep Learning should be reserved for very complex models with typically hundred thousands to many millions of parameters.

      And yes, neural networks are black-boxes.

  7. Nice post. Sure there is no such a thing as GAI. Intelligence is a verb not a noun. Like language, it emerges and gain substance within a context, outside of it is meaningless. What is really interesting is the societal aspects of intelligence, dealing with conflicts between two types of intelligence: us and algorithms that can knows us better than we do.

  8. Aleksandr Blekh says:

    Nice post. In regard to the “one overarching algorithm”, you might be interested in Prof. Pedro Domingo’s book “The Master Algorithm” (https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine/dp/0465065708). Here’s his relevant talk at Google: https://youtu.be/B8J4uefCQMc. (Disclaimer: I don’t agree with all ideas expressed in the talk, but I’m sharing it for the sake of comprehensiveness).