> I'm more shocked that so many people seem unable to come to grips with the fact that something can be a next token predictor and demonstrate intelligence.
Except LLMs have not shown much intelligence. Wisdom yes, intelligence no. LLMs are language models, not 'world' models. It's the difference of being wise vs smart. LLMs are very wise as they have effectively memorized the answer to every question humanity has written. OTOH, they are pretty dumb. LLMs don't "understand" the output they produce.
> To them, if something is a token predictor clearly it can't be doing anything impressive
Shifting the goal posts. Nobody said that a next token predictor can't do impressive things, but at the same time there is a big gap between impressive things and other things like "replace very software developer in the world within the next 5 years."
I think what BoiledCabbage is pointing out is that the fact that it's a next-token-predictor is used as an argument for the thesis that LLMs are not intelligent, and that this is wrong, since being a next-token-predictor is compatible with being intelligent. When mikert89 says "thinking machines have been invented", dgfitz in response strongly implies that for a for thinking machines to exist, they must become "more than a statistical token predictor". Regardless of whether or not thinking machines currently exist, dgfitz argument is wrong and BoiledCabbage is right to point that out.
> an argument for the thesis that LLMs are not intelligent, and that this is wrong,
Why is that wrong? I mean, I support that thesis.
> since being a next-token-predictor is compatible with being intelligent.
No. My argument is by definition that is wrong. It's wisdom vs intelligence. Street-smart vs book smart. I think we all agree there is a distinction between wisdom and intelligence. I would define wisdom as being able to recall pertinent facts and experiences. Intelligence is measured in novel situations, it's the ability to act as if one had wisdom.
A next token predictor by definition is recalling. The intelligence of a LLM is good enough to match questions to potentially pertinent definitions, but it ends there.
It feels like there is intelligence for sure. In part it is hard to comprehend what it would be like to know the entirety of every written word with perfect recall - hence essentially no situation is novel. LLMs fail on anything outside of their training data. The "outside of the training" data is the realm of intelligence.
I don't know why it's so important to argue that LLMs have this intelligence. It's just not there by definition of "next token predictor", which is at core a LLM.
For example, a human being probably could pass through a lot of life by responding with memorized answers to every question that has ever been asked in written history. They don't know a single word of what they are saying, their mind perfectly blank - but they're giving very passable and sophisticated answers.
> When mikert89 says "thinking machines have been invented",
Yeah, absolutely they have not. Unless we want to reducto absurd-um the definition of thinking.
> they must become "more than a statistical token predictor"
Yup. As I illustrated by breaking down the components of "smart" into the broad components of 'wisdom' and 'intelligence', through that lens we can see that next token predictor is great for the wisdom attribute, but it does nothing for intelligence.
>dgfitz argument is wrong and BoiledCabbage is right to point that out.
Why exactly? You're stating apriori that the argument is wrong without saying way.
> A next token predictor by definition is recalling.
I think there may be some terminology mismatch, because under the statistical definitions of these words, which are the ones used in the context of machine learning, this is very much a false assertion. A next-token predictor is a mapping that takes prior sentence context and outputs a vector of logits to predict the next most likely token in the sequence. It says nothing about the mechanisms by which this next token is chosen, so any form of intelligent text can be output.
A predictor is not necessarily memorizing either, in the same way that a line of best fit is not a hash table.
> Why exactly? You're stating a priori that the argument is wrong without saying way.
Because you can prove that for any human, there exists a next-token predictor that universally matches word-for-word their most likely response to any given query. This is indistinguishable from intelligence. That's a theoretical counterexample to the claim that next-token prediction alone is incapable of intelligence.
I think what you are missing is the concept of generalization. It is obviously not possible to literally recall the entire training dataset, since the model itself is much smaller than the data. So instead of memorizing all answers to all questions in the training data, which would take up too much space, the predictor learns a more general algorithm that it can execute to answer many different questions of a certain type. This takes up much less space, but still allows it to predict the answers to the questions of that type in the training data with reasonable accuracy. As you can see it's still a predictor, only under the hood it does something more complex than matching questions to definitions. Now the thing is that if it's done right, the algorithm it has learned will generalize even to questions that are not in the training data. But it's nevertheless still a next-token-predictor.
Except LLMs have not shown much intelligence. Wisdom yes, intelligence no. LLMs are language models, not 'world' models. It's the difference of being wise vs smart. LLMs are very wise as they have effectively memorized the answer to every question humanity has written. OTOH, they are pretty dumb. LLMs don't "understand" the output they produce.
> To them, if something is a token predictor clearly it can't be doing anything impressive
Shifting the goal posts. Nobody said that a next token predictor can't do impressive things, but at the same time there is a big gap between impressive things and other things like "replace very software developer in the world within the next 5 years."