My point is that the limits of LLMs will be hit long before we they start to tak...

tim333 · 2025-09-30T23:08:22 1759273702

>the limits of LLMs will be hit long before we they start to take on human capabilities

Against that you have stuff like Deepmind getting gold in the International Collegiate Programming Contest the other week, including solving one problem where "none of the human teams, including the top performers from universities in Russia, China and Japan, got it right" https://www.theguardian.com/technology/2025/sep/17/google-de...

There's kind of a contradiction that they are nowhere near human capabilities while also beating humans in various competitions.

hnlmorg · 2025-10-01T14:24:33 1759328673

I don’t see that as a contradiction but I do appreciate how some might.

You can train anything to be really good at specialised fields. But that doesn’t mean they’re a good generalist.

For example:

you can train anything child to memorise the 10 times table. But that doesn’t mean they’re can perform long division.

Being an olympic-class cyclist doesn’t mean you’re any good as an F1 driver nor swimming nor Fencing.

Being highly specialised usually means you’re not as good at general things. And that’s as true for humans as it is for computers.

tim333 · 2025-10-01T15:28:52 1759332532

Though in your examples cyclists can learn to drive as humans have similar abilities.

I'll give you current GPT stuff has it's limitations - it can't come fix your plumbing say and pre-trained transformers aren't good at learning things after their pre-training but I'm not sure they are nowhere near human capabilities such that they can't be fixed up.

hnlmorg · 2025-10-01T16:24:32 1759335872

You cannot use an LLM to solve mathematic equations.

That’s not a training issue, that’s a limitation of a technology that’s at its core, a text prediction engine.

tim333 · 2025-10-01T16:58:20 1759337900

Yet if you look at deepmind getting gold in the IMO it seems quite equationish.

Questions and answers: https://storage.googleapis.com/deepmind-media/gemini/IMO_202...

tim333 · 2025-09-28T17:48:00 1759081680

There is no particular reason why AI has to stick to language models though. Indeed if you want human like thinking you pretty much have to go beyond language as we do other stuff too if you see what I mean. A recent example: "Google DeepMind unveils its first “thinking” robotics AI" https://arstechnica.com/google/2025/09/google-deepmind-unvei...

hnlmorg · 2025-09-28T20:04:59 1759089899

> There is no particular reason why AI has to stick to language models though.

There’s no reason at all. But that’s not the technology that’s in the consumer space, growing exponentially, gaining all the current hype.

So at this point in time, it’s just a theoretical future that will happen inevitably but we don’t know when. It could be next year. It could be 10 years. It could be 100 years or more.

My prediction is that current AI tech plateaus long before any AGI-capable technology emerges.

tim333 · 2025-09-29T17:06:15 1759165575

Yeah, quite possible.

solid_fuel · 2025-09-29T08:50:15 1759135815

That's a rather poor choice for an example considering Gemini Robotics-ER is built on a tuned version of Gemini, which is itself an LLM. And while the action model is impressive, the actual "reasoning" here is still being handled by an LLM.

From the paper [0]:

> Gemini Robotics 1.5 model family. Both Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 inherit Gemini’s multimodal world knowledge.

> Agentic System Architecture. The full agentic system consists of an orchestrator and an action model that are implemented by the VLM and the VLA, respectively:

> • Orchestrator: The orchestrator processes user input and environmental feedback and controls the overall task flow. It breaks complex tasks into simpler steps that can be executed by the VLA, and it performs success detection to decide when to switch to the next step. To accomplish a user-specified task, it can leverage digital tools to access external information or perform additional reasoning steps. We use GR-ER 1.5 as the orchestrator.

> • Action model: The action model translates instructions issued by the orchestrator into low-level robot actions. It is made available to the orchestrator as a specialized tool and receives instructions via open-vocabulary natural language. The action model is implemented by the GR 1.5 model.

AI researchers have been trying to discover workable architectures for decades, and LLMs are the best we've got so far. There is no reason to believe that this exponential growth on test scores would or even could transfer to other architectures. In fact, the core advantage that LLMs have here is that they can be trained on vast, vast amounts of text scraped from the internet and taken from pirated books. Other model architectures that don't involve next-token-prediction cannot be trained using that same bottomless data source, and trying to learn quickly from real-world experiences is still a problem we haven't solved.

[0] https://storage.googleapis.com/deepmind-media/gemini-robotic...

rmwaite · 2025-09-30T11:41:48 1759232508

My problem with takes like this is it presumes a level of understanding of intelligence in general that we simply do not have. We do not understand consciousness at all, much less consciousness that exhibits human intelligence. How are we to know what the exact conditions are that result in human-like intelligence? You’re assuming that there isn’t some emergent phenomenon that LLMs could very well achieve, but have not yet.

hnlmorg · 2025-09-30T12:58:46 1759237126

I'm not making a philosophical argument about what human-like intelligence is. I'm saying LLMs have many weaknesses that make in incapable of performing basic functions that humans take for granted. Like count and recall.

I go into much more detail here: https://news.ycombinator.com/item?id=45422808

Ostensibly, AGI might use LLMs in parts of it's subsystems. But the technology behind LLMs doesn't adapt to all of the problems that AGI would need to solve.

It's a little like how the human brain isn't just one homogeneous grey lump. There's different parts of the brain that specialize on different parts of cognitive processing.

LLMs might work for language processing, but that doesn't mean it would work for maths reasoning -- and in fact we already know it doesn't.

This is why we need tools / MCPs. We need ways of turning problems LLM cannot solve into standalone programs that LLMs can cheat and ask the answers for.

fjdjshsh · 2025-09-30T00:38:33 1759192713

>the limits of LLMs will be hit long before we they start to take on human capabilities.

Why do you think this? The rest of the comment is just rephrasing this point ("llms isn't suited for AGI"), but you don't seem to provide any argument.

hnlmorg · 2025-09-30T07:20:31 1759216831

Fair point.

Basically AGI describes human-like capabilities.

The problem with LLMs are that they’re, at their core, a token prediction model. Tokens, typically text, are given a numeric value and can then be used to predict what tokens should follow.

This makes them extremely good things like working with source code and other source of text where relationships are defined via semantics.

The problem with this is that it makes them very poor at dealing with:

1. Limited datasets. Smaller models are shown to be less powerful. So often LLMs need to inject significantly more information than a human would learn in their entire life time, just to approximate what that human might produce in any specific subject.

2. Learning new content. Here we have to rely on non-AI tooling like MCPs. This works really well under the current models because we can say “scrape these software development references” (etc) to keep itself up to date. But there’s no independence behind those actions. An MCP only works because it includes into the prompt how to use that MCP and why you should use that. Whereas if you look at humans, even babies know how to investigate and learn independently. Our ability to self-learn is one of the core principles of human intelligence.

3. Remember past content that resides outside of the original model training. I think this is actually a solvable problem in LLMs but there’s current behaviour of them is to bundle all the current interactions into the next prompt. In reality, the LLM hasn’t really remembered anything, you’re just reminding it about everything with each exchange. So each subsequent prompt gets longer and thus more fallible. It also means that context is always volatile. Basically it’s just a hack that only works because context sizes have grown exponentially. But if we want AGI then there needs to be a persistent way of retaining that context. There are some work around here, but they depend on tools.

4. any operation that isn’t semantic-driven. Things like maths, for example. LLMs have to call a tool (like MCPs) to perform calculations. But that requires having a non-AI function to return a result rather than the AI reason about maths. So it’s another hack. And there are a lot of domains that fall into this kind of category where complex tokenisation is simply not enough. This, I think, is going to be the biggest hurdle for LLMs.

5. Anything related to the physical world. We’ve all seen examples of computer vision models drawing too many fingers on a hand or have disembodied objects floating. The solutions here are to define what a hand should look like. But without an AI having access to a physical 3 dimensional world to explore, it’s all just guessing what things might look like. This is particularly hard for LLMs because they’re language models, not 3D coordinate systems.

There’s also the question about whether holding vector databases of token weights is the same thing as “reasoning”, but I’ll leave that argument for the philosophers.

I think a theoretical AGI might use LLMs as part of its subsystems. But it needs to leverage AI throughout, which LLMs cannot, as it needs handle topics that are more than just token relationships, which LLMs cannot do.

PeterStuer · 2025-09-30T06:18:58 1759213138

AI services are/will be going hybrid. Just like we have seen in search, with thousands of dedicated subsystems handling niches behind the single unified ui element or api call.

hnlmorg · 2025-09-30T06:45:29 1759214729

“Hybrid” is just another way of saying “AI isn’t good enough to work independently”. Which is the crux of my point.