I'm guessing that solving run-time learning may well require a different approach, and it's not clear that reasoning (in general form - ability to dynamically synthesize a problem-specific solution) can be just added to LLMs either (e.g. by adding tree search). There are also other missing components such as working memory that seem simpler to solve.
Coming up with brand new architectures and learning approaches is likely to take time. There have been attempts to find alternatives to gradient descent, but none very successful despite a lot of effort.
Perhaps it's just a reflection of people chasing the low hanging fruit (and as Chollet says, LLMs "sucking all the oxygen out of the room"), but architectural advance post-LLM has been minimal. In 7 years we've basically just gone from transformer paper to big pre-trained transformers.
Even when workable architectural approaches to run-time learning and reasoning have been developed, they will also need to be scaled up (another 7 years?), and will also be competing with LLMs for mindshare and dev. resources as long as scaling LLMs continues to be seen as profitable.
The timescale for coming up with new architectures and approaches is hard to predict. AGI prediction timeframes have always been wrong, and the transformer was really one of history's accidental discoveries. Who'd have guessed that a better seq-to-seq model would create such capabilities!
If I had to guess, I'd say human-level AGI (human-level in terms of both capability and generality) is still 15-20 years away at least. 7 years to go from small transformers to big transformers doesn't make me optimistic that architectural innovation is going to happen very quickly, and anyways this is an unpredictable research problem, not an engineering one.
But even now the LLMs absolutely have limited problem solving capability.
For example, yesterday I asked GPT-4o to write multiple alternate endings to the short story "The Last Equation". They weren't dramatically compelling, but they were logical and functional.
How is that not problem solving? And so help me, before anyone tells me it's just stringing together the next most likely tokens - I don't care. Clearly that is at least a primitive form of intelligence. Actually it's not even apparent to me that that isn't exactly what human intelligence is doing...
I would define intelligence as "degree of ability to use past experience to predict future outcomes" (which includes reasoning, aka problem solving ability, via repeated what-if prediction, then backtracking/learning on failure etc).
So, intelligence exists on a spectrum - some things are easier to predict given a set of learnt facts and methods than others. The easiest things to predict (the most basic form of intelligence) is "next time will be the same as last time", which is basically memorization and pattern matching, which is mostly what LLMs are able to do thanks to brute-force pattern/rule extraction via gradient descent.
Going beyond "next time will be the same as last time" is where reasoning comes in - where you have the tools (experience) to solve a problem, but it requires a problem-specific decomposition into sub-problems and trial-and-error planning/testing to apply learnt techniques to make progress on the problem...
Certainly a lot of human behavior (applied intelligence) is of the shallow "system 1" pattern matching variety, but I think this is over stated. Not only is "system 2" problem-solving needed for on-the-job training, but I think we're using it all the time when we're doing anything more than reacting to the current situation in mindless fashion.
So, sure, LLMs have limited intelligence, but it's only "system 1" shallow intelligence, gestalt pattern recognition, based on training-time gradient descent learning. What they are missing is run-time "system 2" problem-solving.
I'm guessing that solving run-time learning may well require a different approach, and it's not clear that reasoning (in general form - ability to dynamically synthesize a problem-specific solution) can be just added to LLMs either (e.g. by adding tree search). There are also other missing components such as working memory that seem simpler to solve.
Coming up with brand new architectures and learning approaches is likely to take time. There have been attempts to find alternatives to gradient descent, but none very successful despite a lot of effort.
Perhaps it's just a reflection of people chasing the low hanging fruit (and as Chollet says, LLMs "sucking all the oxygen out of the room"), but architectural advance post-LLM has been minimal. In 7 years we've basically just gone from transformer paper to big pre-trained transformers.
Even when workable architectural approaches to run-time learning and reasoning have been developed, they will also need to be scaled up (another 7 years?), and will also be competing with LLMs for mindshare and dev. resources as long as scaling LLMs continues to be seen as profitable.
The timescale for coming up with new architectures and approaches is hard to predict. AGI prediction timeframes have always been wrong, and the transformer was really one of history's accidental discoveries. Who'd have guessed that a better seq-to-seq model would create such capabilities!
If I had to guess, I'd say human-level AGI (human-level in terms of both capability and generality) is still 15-20 years away at least. 7 years to go from small transformers to big transformers doesn't make me optimistic that architectural innovation is going to happen very quickly, and anyways this is an unpredictable research problem, not an engineering one.