Failures in the Deep Learning Approach: Architectures and Meta-Parameterization

2017 Mar 26

[](http://flavioclesio.com/wp-content/uploads/2017/03/arc.jpg) The biggest challenge currently faced by the industry regarding Deep Learning is undoubtedly in the computational aspect, where the entire market is absorbing cloud services for increasingly complex calculations as well as investing in GPU computing power. However, even with hardware today being a commodity, academia is solving a problem that can revolutionize how Deep Learning is done, which is in the architectural/parameterization aspect. This comment from a thread says a lot about this problem, where the user states:

“The main problem I see with Deep Learning: too many parameters.

When you have to find the best value for the parameters, that’s a gradient search by itself. The curse of meta-dimensionality.”

In other words, even with all the hardware availability, the question of what is the best architectural arrangement for a deep neural network? is still not resolved. This paper by Shai Shalev-Shwartz, Ohad Shamir, and Shaked Shammah called “Failures of Deep Learning” exposes this problem quite richly, including experiments (this is the Git repository). The authors state that the failure points of Deep Learning networks are a) lack of gradient-based methods for parameter optimization, b) structural problems in Deep Learning algorithms in problem decomposition, c) architecture, and d) saturation of activation functions. In other words, what might be happening in a large part of Deep Learning applications is that the convergence time could be much shorter if these aspects were already resolved. With this resolved, a large part of what we know today as the hardware industry for Deep Learning networks would either be extremely underutilized (i.e., given that there will be an improvement from the architectural/algorithmic optimization point of view) or could be used for more complex tasks (e.g., like image recognition with a low number of pixels). Thus, even by adopting a hardware-based methodology as the industry has been doing, there is still much room for optimization regarding Deep Learning networks from an architectural and algorithmic perspective. Below is a list of references directly from Stack Exchange for those who want to delve deeper into the subject: Neuro-Evolutionary Algorithms

Zaremba, Wojciech. Ilya Sutskever. Rafal Jozefowicz “An empirical exploration of recurrent network architectures.” (2015): used evolutionary computation to find optimal RNN structures.
Franck Dernoncourt. “The medial Reticular Formation: a neural substrate for action selection? An evaluation via evolutionary computation.”. Master’s Thesis. École Normale Supérieure Ulm. 2011.
Bayer, Justin, Daan Wierstra, Julian Togelius, and Jürgen Schmidhuber. “Evolving memory cell structures for sequence learning.” In International Conference on Artificial Neural Networks, pp. 755-764. Springer Berlin Heidelberg, 2009.: used evolutionary computation to find optimal RNN structures.

Reinforcement Learning:

Jose M Alvarez, Mathieu Salzmann. Learning the Number of Neurons in Deep Networks. NIPS 2016. https://arxiv.org/abs/1611.06321
Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar. Designing Neural Network Architectures using Reinforcement Learning. https://arxiv.org/abs/1611.02167
Barret Zoph, Quoc V. Le. Neural Architecture Search with Reinforcement Learning. https://arxiv.org/abs/1611.01578

Miscellaneous:

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas. Learning to learn by gradient descent by gradient descent. https://arxiv.org/abs/1606.04474
Franck Dernoncourt, Ji Young Lee Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification, IEEE SLT 2016.
Cortes, Corinna, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. “AdaNet: Adaptive Structural Learning of Artificial Neural Networks.” arXiv preprint arXiv:1607.01097 (2016). https://arxiv.org/abs/1607.01097 : Approach that learns both the structure of the network as well as its weights.

P.S.: Wordpress removed the justify text option, so apologies in advance for the amateur appearance of the blog in the coming days.