2022-04-26
Perched on the shoulders of every AI researcher are two creatures, one the voice of reason and the other a charlatan, but it is yet to be discerned which is which. On one shoulder is Gottfried, who believes that deep learning will only take us so far, and that achieving artificial general intelligence (AGI) will require a deep conceptual breakthrough concerning the nature of intelligence. On the other shoulder is Edmund, who believes that while there are still technical breakthroughs to be had around the margins, scaling large neural networks trained with gradient-based learning is the surest path to AGI. The disagreement between Gottfried and Edmund hinges on a claim I will call the legibility hypothesis, with the former affirming it and the latter denying it:
The Legibility Hypothesis: The algorithm for general intelligence contained within the human genome is legible enough for a human to deduce and reverse-engineer.
The following is a dialogue between Gottfried and Edmund where they arrive at the legibility hypothesis as the crux of their disagreement.
Gottfried: What madness it is that deep learning models require millions of training examples and enough compute to fill a warehouse! Human learning is leaps and bounds more efficient—a child only needs to see one photo of a giraffe to learn what a giraffe is. We are missing a key piece of the puzzle in our understanding of intelligence if we can’t explain how humans generalize so widely with such little data. To make progress towards AGI, we need to take a step back from trillion parameter models and try to figure out what that piece of the puzzle is.
Edmund: You cannot compare the sample efficiency of a human with the sample efficiency of a randomly-initialized neural network. You see, the neural network starts as a blank slate. But the child begins life with a DNA sequence that is the product of millions of years of evolution and much of its learning is guided by information encoded in its genome. So it is not surprising that a machine needs a lot of trial and error to become intelligent: it took evolution a lot of trial and error to get general intelligence right.
Gottfried: Sure, it may have taken evolution millions of years and a lot of trial and error to produce intelligence, but that doesn’t matter because the end product is right there encoded in our genome. We don’t have to retrace every step that evolution took, we can just reconstruct the algorithm that’s there in the DNA.
Edmund: In principle, that might be possible. But think about how difficult it would be! The algorithm for general intelligence might be so unintelligible that it is next to impossible for a mere human to figure out. The human general intelligence algorithm came about purely through evolutionary pressure, and nowhere was the ease with which humans may one day grasp it considered. Understanding the algorithm may require notions that are entirely alien to a human. We are far better off building machines that discover this algorithm for themselves by learning from data and leveraging massive compute.
Gottfried: I disagree with you. I think that humans are perfectly capable of figuring out how to build complex systems from first principles. Consider the Python interpreter: it’s an extremely complex program with awesome variety in its behavior. And a single programmer was able to write it! Now if a single programmer can deduce from first principles how to build the 1MB python interpreter, surely the collective effort of AI scientists is enough to crack the 2GB human genome, only a tiny portion of which is devoted to intelligence anyway.
Edmund: Your Python interpreter example speaks to how complex a system humans can produce from their capacity for reason. But that doesn’t yield any insight as to whether or not we could crack the genome’s general intelligence algorithm. The question we should instead ask is this: given an arbitrary program of a given complexity drawn from the entire space of programs that complex, what is the probability humans could figure out how it works and reproduce it? I think it is low. Consider how physicists have struggled to decipher the theory of everything, despite the fact we believe it has a neat mathematical form.
I believe the probability humans can deduce a functional equivalent of the genome’s general intelligence algorithm is low, perhaps 1%. In fact, my belief that creating AGI depends on large models and gradient-based learning hinges on this estimate.
Gottfried: Interesting. I believe that the probability humans can deduce a functional equivalent of the genome’s general intelligence algorithm from first principles is high, perhaps 95%. In fact, my belief that creating AGI depends on a conceptual breakthrough concerning the nature of intelligence hinges on this estimate.