A few days ago, I listened to a podcast with Lex Fridman and Max Tegmark and found myself fixated on an analogy made by Prof. Tegmark: What do airplanes and artificial neural networks have in common? Both paradigms simplify the solutions of complex, natural problems and both solutions work pretty well.
There is no doubt that birds inspired us humans to consider the possibility of flight. However, our solution to flying has not involved mimicking bird’s wings flapping. Thanks to much simpler design, we have built aircraft with engines that overcome aerodynamic drag and maintain sufficient speed, so that fixed wings can generate lift to compensate for aircraft weight. Aircraft of this design allow us to fly in a controlled manner.
Likewise, artificial neural networks have been inspired by how humans learn and by our knowledge of brain and neurons. These artificial neural networks are much simpler than in nature. While the brain is composed of something to the order of 1011 neurons, with dozens of different types of neurons, artificial neural networks are usually composed of simple connections and activations, as unsophisticated as sigmoid functions or piecewise linear functions (see Activation functions review by AFL). By connecting these neurons in an intelligent way, we have been able to solve problems of great interest; good examples are applications in machine translation, computer vision or reinforcement learning.
With his analogy, Max Tegmark may have been referring to his paper written in collaboration with Henry Lin and David Rolnick, “Why does deep and cheap learning work so well?”. The authors of the paper address the question of why artificial neural networks work by analyzing the problems from a physical point of view and introducing the term “cheap learning”. It is widely known that neural networks can approximate any arbitrary function. According to this universal theorem, Tegmark et al show that the domain of functions to approximate in real applications is actually simplified by a set of physical properties. Therefore, we are able to learn with exponentially fewer parameters than in a generic problem. Examples of these properties are how symmetry, locality or low order polynomials can describe physical systems. Another very relevant property is the hierarchical structure of the physical processes that generate distributions of interest. This is why deep learning models—namely, neural networks with several layers—can approximate more easily and with fewer parameters than a shallow neural network.
I find these ideas, reflections and points of view to be fascinating, and I strongly recommend you listen to more Lex Fridman interviews. Lex is a MIT professor who teaches a deep learning fundamentals class at MIT; as support material for his class, he has produced this podcast, in which he chats with fellow leaders and researchers. What I find exciting about the podcast is its variety: Fridman does not only discuss AI and ML, but expands to contexts such as the mind, consciousness, biology and philosophy. I think this variety and breadth of experience creates an environment from which ideas can flourish, which is particularly refreshing given how easily we lose track of the greater world view at hand when pursuing very specific niches and specializations.
References: