> There's a difference between some people sometimes needing to be reminded to do something, and them flat out not being able to do it due to fundamental cognitive limitations.
GPT4 isn't "flat out not able to do it" when reminded. My point was that I have had the same experience of having to prompt step by step and go "why did you do that? Follow the steps" with both fully functional, normally intelligent people and with GPT4 for similarly complex tasks, and given the improvement between 3.5 and 4 there's little reason to assume this won't keep improving for at least some time more.
> That's because GPT4 has been custom tuned and trained on that specific task as well, along with many others. It's that training, why it was necessary and how it works that the paper referred to previously was about.
So it can do it when trained, just like people, in other words.
> They will infer from that they work cognitively in the same way as humans
And that would be bad. But so is automatically assuming that there's any fundamental difference between how they work and how human reasoning work given that we simply do not know how human reasoning work, and given that LLMs in increasing number of areas show similar behaviour (failure to e.g. fall back on learned rules) when their reasoning breaks down as untrained people.
Again, I'm not saying they're reasoning like people, but I'm saying that we know very little about what the qualitative differences are outside of the few glaringly obvious aspects (e.g. lack of lasting memory and lack of ongoing reinforcement during operation), and we don't know how necessary those will be (we do know that humans can "function" for some values of function without the ability to form new lasting memories, but obviously it provides significant functional impairment).
> Again, I'm not saying they're reasoning like people
Cool, that’s really the only point I’m making. On the one hand it’s certainly true we can overcome a lot of the limitations imposed by that basic token sequence prediction paradigm, but they are just workarounds rather than general solutions and therefore are limited in interesting ways.
Obviously I don't know for sure how things will pan out, but I suspect we will soon encounter scaling limitations in the current approach. Not necessarily scaling limitations fundamental to the architecture as such, but limitations in our ability to develop sufficiently well developed training texts and strategies across so many problem domains. That may be several model generations away though.
To be clear, I'm saying that I don't know if they are, not that we know that it's not the same.
It's not at all clear that humans do much more than "that basic token sequence prediction" for our reasoning itself. There are glaringly obvious auxiliary differences, such as memory, but we just don't know how human reasoning works, so writing off a predictive mechanism like this is just as unjustified as assuming it's the same. It's highly likely there are differences, but whether they are significant remains to be seen.
> Not necessarily scaling limitations fundamental to the architecture as such, but limitations in our ability to develop sufficiently well developed training texts and strategies across so many problem domains.
I think there are several big issues with that thinking. One is that this constraint is an issue now in large part because GPT doesn't have "memory" or an ability to continue learning. Those two need to be overcome to let it truly scale, but once they are, the game fundamentally changes.
The second is that we're already at a stage where using LLMs to generate and validate training data works well for a whole lot of domains, and that will accelerate, especially when coupled with "plugins" and the ability to capture interactions with real-life users [1]
E.g. a large part of human ability to do maths with any kind of efficiency comes down to rote repetition and generating large sets of simple quizzes for such areas is near trivial if you combine an LLM at tools for it to validate its answers. And unlike with humans where we have to do this effort for billions of humans, once you have an ability to let these models continue learning you make this investment in training once (or once per major LLM effort).
A third is that GPT hasn't even scratched the surface in what is available in digital collections alone. E.g. GPT3 was trained on "only" about 200 million Norwegian words (I don't have data for GPT4). Norwegian is a tiny language - this was 0.1% of GPT3's total corpus. But the Norwegian National Library has 8.5m items, which includes something like 10-20 billion words in books alone, and many tens of billions more in newspapers, magazines and other data. That's one tiny language. We're many generations of LLM's away from even approaching exhausting the already available digital collections alone, and that's before we look at having the models trained on that data generate and judge training data.
GPT4 isn't "flat out not able to do it" when reminded. My point was that I have had the same experience of having to prompt step by step and go "why did you do that? Follow the steps" with both fully functional, normally intelligent people and with GPT4 for similarly complex tasks, and given the improvement between 3.5 and 4 there's little reason to assume this won't keep improving for at least some time more.
> That's because GPT4 has been custom tuned and trained on that specific task as well, along with many others. It's that training, why it was necessary and how it works that the paper referred to previously was about.
So it can do it when trained, just like people, in other words.
> They will infer from that they work cognitively in the same way as humans
And that would be bad. But so is automatically assuming that there's any fundamental difference between how they work and how human reasoning work given that we simply do not know how human reasoning work, and given that LLMs in increasing number of areas show similar behaviour (failure to e.g. fall back on learned rules) when their reasoning breaks down as untrained people.
Again, I'm not saying they're reasoning like people, but I'm saying that we know very little about what the qualitative differences are outside of the few glaringly obvious aspects (e.g. lack of lasting memory and lack of ongoing reinforcement during operation), and we don't know how necessary those will be (we do know that humans can "function" for some values of function without the ability to form new lasting memories, but obviously it provides significant functional impairment).