Hacker Newsnew | past | comments | ask | show | jobs | submit | gsleblanc's commentslogin

It's increasingly looking naive to assume scaling LLMs is all you need to get to full white-collar worker replacement. The attention mechanism / hopfield network is fundamentally modeling only a small subset of the full human brain, and all the increasing sustained hype around bolted-on solutions for "agentic memory" is, in my opinion, glaring evidence that these SOTA transformers alone aren't sufficient even when you just limit the space to text. Maybe I'm just parroting Yann LeCun.


You probably are.

The "small subset" argument is profoundly unconvincing, and inconsistent with both neurobiology of the human brain and the actual performance of LLMs.

The transformer architecture is incredibly universal and highly expressive. Transformers power LLMs, video generator models, audio generator models, SLAM models, entire VLAs and more. It not a 1:1 copy of human brain, but that doesn't mean that it's incapable of reaching functional equivalence. Human brain isn't the only way to implement general intelligence - just the one that was the easiest for evolution to put together out of what it had.

LeCun's arguments about "LLMs can't do X" keep being proven wrong empirically. Even on ARC-AGI-3, which is a benchmark specifically designed to be adversarial to LLMs and target the weakest capabilities of off the shelf LLMs, there is no AI class that beats LLMs.


> Human brain isn't the only way to implement general intelligence - just the one that was the easiest for evolution to put together out of what it had.

The human brain is not a pretrained system. It's objectively more flexible than than transformers and capable of self-modulation in ways that no ML architecture can replicate (that I'm aware of).


Human brain's "pre-training" is evolution cramming way too much structure into it. It "learns from scratch" the way it does because it doesn't actually learn from scratch.

I've seen plenty of wacky test-time training things used in ML nowadays, which is probably the closest to how the human brain learns. None are stable enough to go into the frontier LLMs, where in-context learning still reigns supreme. In-context learning is a "good enough" continuous learning approximatation, it seems.


> In-context learning is a "good enough" continuous learning approximatation, it seems.

"it seems" is doing a herculean effort holding your argument up, in this statement. Say, how many "R"s are in Strawberry?


If you think that "strawberry" is some kind of own, I don't know what to tell you. It takes deep and profound ignorance of both the technical basics of modern AIs and the current SOTA to do this kind of thing.

LLMs get better release to release. Unfortunately, the quality of humans in LLM capability discussions is consistently abysmal. I wouldn't be seeing the same "LLMs are FUNDAMENTALLY FLAWED because I SAY SO" repeated ad nauseam otherwise.


I can ask a nine-year-old human brain to solve that problem with a box of Crayola and a sheet of A4 printer paper.

In-context learning is professedly not "good enough" to approximate continuous learning of even a child.


You're absolutely wrong!

You can also ask an LLM to solve that problem by spelling the word out first. And then it'll count the letters successfully. At a similar success rate to actual nine-year-olds.

There's a technical explanation for why that works, but to you, it might as well be black magic.

And if you could get a modern agentic LLM that somehow still fails that test? Chances are, it would solve it with no instructions - just one "you're wrong".

1. The LLM makes a mistake

2. User says "you're wrong"

3. The LLM re-checks by spelling the word out and gives a correct answer

4. The LLM then keeps re-checking itself using the same method for any similar inquiry within that context

In-context learning isn't replaced by anything better because it's so powerful that finding "anything better" is incredibly hard. It's the bread and butter of how modern LLM workflows function.


This is false. You can ask it to spell out strawberry and count the letters and it will still say 2 (it's unable to actually count the letters by the way). The only way to get a model that believes strawberry has 2 R's to consistently give the correct answer is to ask it to code the problem and return the output.

In fact, asking a model not to repeat the same mistake makes it more likely to commit that mistake again, because it's in it's context.

I think anyone who uses LLMs a lot will tell your that your steps 3 and 4 are fictional.


Have you actually tried?

The "spell out" trick, by the way, was what was added to the system prompts of frontier models back when this entire meme was first going around. It did mitigate the issue.


> it's so powerful that finding "anything better" is incredibly hard.

We're back around to the start again. "Incredibly hard" is doing all of the heavy lifting in this statement, it's not all-powerful and there are enormous failure cases. Neither the human brain nor LLMs are a panacea for thought, but nobody in academia or otherwise is seriously comparing GPT to the human brain. They're distinct.

> There's a technical explanation for why that works, but to you, it might as well be black magic.

Expound however much you need. If there's one thing I've learned over the past 12 months it's that everyone is now an expert on the transformer architecture and everyone else is wrong. I'm all ears if you've got a technical argument to make, the qualitative comparison isn't convincing me.


I do know far more than you, which is a laughably low bar. If you want someone to hold your hand through it, ask an LLM.

The key words are "tokenization" and "metaknowledge", the latter being the only non-trivial part. An LLM can explain it in detail. They know more than you do too.


why is the breakdown from words to letters your highest priority thing to add to the training data?

what problem does this allow you to solve that you couldnt otherwise?


This comment is tangential to their point that a transformer architecture can or cannot be functionally equivalent to a human brain. Practicality of those limitations is a different discussion


> you just limit the space to text

And even then... why can't they write a novel? Or lowering the bar, let's say a novella like Death in Venice, Candide, The Metamorphosis, Breakfast at Tiffany's...?

Every book's in the training corpus...

Is it just a matter of someone not having spent a hundred grand in tokens to do it?


I know someone spending basically every day writing personal fan fiction stories using every model you can find. She doesn't want to share it, and does complain about it a lot, seems like maintaining consistency for something say 100 pages long is difficult


I don’t understand - there are hundreds/thousands of AI written books available now.


I've glossed over a few and one can immediately tell they don't meet the average writing level you'd see in a local workshop for writers, and much less that of Mann or Capote.


Never mind novels, it can't even write a good Reddit-style or HN-style comment. agentalcove.ai has an archive of AI models chatting to one another in "forum" style and even though it's a good show of the models' overall knowledge the AIisms are quite glaring.


They definitely can, and do.

It's just that the ones that manage to suppress all the AI writing "tells" go unnoticed as AI. This is a type of survivorship bias, though I feel there must be a better term for it that eludes me.


Who says they can't? What's your bar that needs to be passed in order for "written a novella" to be achieved?

There's a lot of bad writing out there, I can't imagine nobody has used an LLM to write a bad novella.


> What's your bar that needs to be passed

I provide four examples in my comment...


Your qualification for if an LLM can write a novella is it has to be as good as The Metamorphosis?

Yes, those are examples of novellas, surely you believe an LLM could write a bad novella? I'm not sure what your point is. Either you think it can't string the words together in that length or your standard is it can't write a foundational piece of literature that stays relevant for generations... I'm not sure which.


I don't think it can write something that's of a fraction of the quality of Kafka.

But GP's argument ("limit the space to text") could be taken to imply - and it seems to be a common implication these days - that LLMs have mastered the text medium, or that they will very soon.

> it can't write a foundational piece of literature

Why not, if this a pure textual medium, the corpus includes all the great stories ever written, and possibly many writing workshops and great literature courses?


I don't know what to tell you. It's more than a little absurd to make the qualification of being able to do something to be that the output has to be considered a great work of art for generations.


I agree that the argument starts from a reduction to the absurd.

So at least we can agree that AI hasn't mastered the text medium, without further qualification?

And what about my argument, further qualified, which is that I don't think it could even write as well as a good professional writer - not necessarily a generational one?


>AI hasn't mastered the text medium

I don't know what this means and I don't know what would qualify it as having "mastered" at all. Seems like a no-true-Scotsman thing where regardless there would always be someone that it couldn't actually do a thing because this and that.

>why can't they write a novel?

This is what I'm disagreeing with. I think an LLM can write a novel well enough that it's recognizably a pretty mediocre novel, no worse than the median written human novel which to be fair is pretty bad. You seem to have an unqualified bar something needs to pass before "writing a novel" is accomplished but it's not clear what that is. At the same time you're switching between the ability to do a thing and the ability to do a thing in a way that's honored as the best of the best for a century. So I don't know it kind of seems like you just don't like AI and have a different standard for it that adjusts so that it fails. This doesn't match what you'd consider some random Bob's ability to do a thing.


I don't dislike AI, I use it every day for coding and increasingly for non-technical tasks, and have also used it in enterprise workloads to great success. I am fairly optimistic about it - I think it will remove a lot of drudgery and make things economical which previously weren't.

I am just challenging the notions that "if you limit it to text, it's doing really well" or that the text contains in itself all the information that is needed to carry out a task to a certain level of quality. This applies in my experience not only to writing literature but also to certain human tasks which may appear mundane and easy to automate.


If the end result is most books will be written by AI you need the possiblity of that qualification. If its only capable of certain types of book then we will need endless amounts of that.


I think they're as good as they're going to get from scaling. They can still get more efficient, and tooling/harnesses around them will improve.


How did we get to allowing this in the USA? I remember the zeitgeist used to be to make fun of China's mass surveillance / social credit system, and ten years ago proposing to build something like this in the USA would be unthinkable. It's wild that we're just willingly sliding into the same system here too.


> ten years ago proposing to build something like this in the USA would be unthinkable

I think you have your history a bit mixed up. In 2013, Snowden exposed the PRISM program and nobody gave a rat's ass. It was the clear and booming signal that nobody really cares about privacy in the US, and a clear signal that fascist interests have an opportunity to expand. I think Flock would have done really well back then. There is a long, bloody road of futile fighting against the surveillance machine the US has become.

We love to elevate ourselves above China while engaging in many of the same behaviors (although our version of insidious mass surveillance is privatized, which magically makes it better).

All that to say, adjust your timeline by a decade or two and your statement is correct again.


It's been bad since the patriot act.


I for one cannot wait to run my cloud infrastructure on Tesla Web Services


Running Dojo D1 chips?


If you're interested in this kind of thing, inaturalist is another great resource with significantly more activity (at least in the US)


The first ten images are... interesting.


I discovered the same thing. I, umm... don't think these were generated by the neural net?


Actually, I'm pretty sure those are generated by a neural net. There are clear artifacts on the images that give it away. Maybe the author experimented with different source material before going for abstract paintings.


A real life easter egg!


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: