Regarding Neural Network: At my last job, we tried to find the best representation of algorithms to test their perf. There was several things that I got opinions about:
Recomendation systems are too secret at the moment. You can’t find a real open source recomendation system that is big enough for you to care about. I think that’s the reason the 6/9 Neural Networks were better than regular approaches. It’s still possible that the real hidden networks are worse than the regular, but I just can’t be sure.
Quantization works only when your SNR is high enough, at least without retraining, which makes sense. On classification networks, it’s fairly easy to quantize because lowering your SNR does almost nothing to the result. On GANs, I think it’s almost impossible to quantize without seeing worse results. As NN started with classification, it was easy to jump on that train. I don’t think it’s still that much of a viable solution. I believe that bfloat16 or tensorfloat32 is a better solution
It’s fairly hard to get performance out of prunning. Just taking out 90% of the values and setting them to zero means nothing for simd mat multiply. You need to chop out a bunch of channels to make it worth while, and even then, it’s not that easy to get meaningful performance out of it.
I don’t think that LSTM is really that good. It’s old and it passed the test of time, but you really need to change all of those sigmoids to a normal activation function like relu or something suffiencently easy to compute.
degskisays:
The article on neural networks is interesting.
I have dabbled with ann’s since the 90’s. There was no tensorflow or caffe, write from scratch, while the research was slowly advancing, it was all (the research) written down on trees (imagine). I don’t know where to begin to explain why I think it’s all wrong.
It starts with (of course the usual guys), but for me there was in particular Bogdan Wilamovski (http://www.eng.auburn.edu/~wilambm/), he’s not pretty, but has something to say. He demonstrates that a fully connected cascade network is the most general shape of (feed-forward) ann’s (all shape networks are captured by this architecture). This implies that optimizing a network comes down to giving it enough nodes that it has enough plasticity to function (too little, won’t converge), too much over-learning and waste of calculation.
Optimizing the size of the network is easy, coz you only have to modify one variable with no knock-on effect. Dealing in bulk with these kind of networks can be implemented very efficiently using blas. I have trained such a network of 5 nodes (yes five) using GE to play snake at breakneck speed while learning how to play and growing and growing (too large size, it’s impressive what 5 cells can do).
The repo: https://github.com/degski/SimdNet . It’s called SimdNet, coz that’s how it started, it ended up being an ordinary BlasNet ;(. For output I use the new W10 unicode console functionality, it allows to write to the console at will basically, no flicker or artifacts (and all ‘ascii’, like a real snake game). The latter makes it non-portable, the core code is portable of course.
To conclude, you see the difference, I use 5 nodes and a bit of Blas on a moderate computer, and then there is ‘modern’ ann’s. One has too read the right book.
PS: Wilamovski has also published a very efficient 2nd order algorithm for ‘backprop’, notably he decomposes the jacobian in such a way that it can be calculated without first fully expanding it, which would be prohibitive.
PS: All literature is on his web-site, chapter 11,12,13 are the core of his work (in this respect, he seems very busy with the soldering iron otherwise).
Regarding Neural Network: At my last job, we tried to find the best representation of algorithms to test their perf. There was several things that I got opinions about:
Recomendation systems are too secret at the moment. You can’t find a real open source recomendation system that is big enough for you to care about. I think that’s the reason the 6/9 Neural Networks were better than regular approaches. It’s still possible that the real hidden networks are worse than the regular, but I just can’t be sure.
Quantization works only when your SNR is high enough, at least without retraining, which makes sense. On classification networks, it’s fairly easy to quantize because lowering your SNR does almost nothing to the result. On GANs, I think it’s almost impossible to quantize without seeing worse results. As NN started with classification, it was easy to jump on that train. I don’t think it’s still that much of a viable solution. I believe that bfloat16 or tensorfloat32 is a better solution
It’s fairly hard to get performance out of prunning. Just taking out 90% of the values and setting them to zero means nothing for simd mat multiply. You need to chop out a bunch of channels to make it worth while, and even then, it’s not that easy to get meaningful performance out of it.
I don’t think that LSTM is really that good. It’s old and it passed the test of time, but you really need to change all of those sigmoids to a normal activation function like relu or something suffiencently easy to compute.
The article on neural networks is interesting.
I have dabbled with ann’s since the 90’s. There was no tensorflow or caffe, write from scratch, while the research was slowly advancing, it was all (the research) written down on trees (imagine). I don’t know where to begin to explain why I think it’s all wrong.
It starts with (of course the usual guys), but for me there was in particular Bogdan Wilamovski (http://www.eng.auburn.edu/~wilambm/), he’s not pretty, but has something to say. He demonstrates that a fully connected cascade network is the most general shape of (feed-forward) ann’s (all shape networks are captured by this architecture). This implies that optimizing a network comes down to giving it enough nodes that it has enough plasticity to function (too little, won’t converge), too much over-learning and waste of calculation.
Optimizing the size of the network is easy, coz you only have to modify one variable with no knock-on effect. Dealing in bulk with these kind of networks can be implemented very efficiently using blas. I have trained such a network of 5 nodes (yes five) using GE to play snake at breakneck speed while learning how to play and growing and growing (too large size, it’s impressive what 5 cells can do).
The repo: https://github.com/degski/SimdNet . It’s called SimdNet, coz that’s how it started, it ended up being an ordinary BlasNet ;(. For output I use the new W10 unicode console functionality, it allows to write to the console at will basically, no flicker or artifacts (and all ‘ascii’, like a real snake game). The latter makes it non-portable, the core code is portable of course.
To conclude, you see the difference, I use 5 nodes and a bit of Blas on a moderate computer, and then there is ‘modern’ ann’s. One has too read the right book.
PS: Wilamovski has also published a very efficient 2nd order algorithm for ‘backprop’, notably he decomposes the jacobian in such a way that it can be calculated without first fully expanding it, which would be prohibitive.
PS: All literature is on his web-site, chapter 11,12,13 are the core of his work (in this respect, he seems very busy with the soldering iron otherwise).