Hacker Newsnew | past | comments | ask | show | jobs | submit | gallerdude's commentslogin

My job may have become part of the training data with how much coverage there is around it. Perhaps another career would be a better test of LLM capabilities.

Have you ever heard of a black swan?

We used to get one annual release which was 2x as good, now we get quarterly releases which are 25% better. So annually, we’re now at 2.4x better.

The weirdest thing about this AI revolution is how smooth and continuous it is. If you look closely at differences between 4.6 and 4.5, it’s hard to see the subtle details.

A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.

Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.

Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?


If you've been using each new step is very noticeable and so have the mindshare. Around Sonnet 3.7 Claude Code-style coding became usable, and very quickly gained a lot of marketshare. Opus 4 could tackle significant more complexity. Opus 4.6 has been another noticable step up for me, suddenly I can let CC run significantly more independently, allowing multiple parallel agents where previously too much babysitting was required for that.

I think this is where there's a huge distinction between ability/performance/benchmark figures and utility. You can have smooth improvements to performance, but marked step changes in utility as they cross thresholds where you're able to use them for new tasks.

> If you've been using each new step is very noticeable and so have the mindshare. Around Sonnet 3.7 Claude Code-style coding became usable

Yet I vividly remember the complaints about how 3.7 was a regression compared to 3.5 with people advising to stay on 3.5.

Conversely, Sonnet 4 was well received so it's not just a story about how complainers make the most noise.


In terms of real work, it was the 4 series models. That raised the floor of Sonnet high enough to be "reliable" for common tasks and Opus 4 was capable of handling some hard problems. It still had a big reward hacking/deception problem that Codex models don't display so much, but with Opus 4.5+ it's fairly reliable.

Honestly, 4.5 Opus was the game changer. From Sonnet 4.5 to that was a massive difference.

But I'm on Codex GPT 5.3 this month, and it's also quite amazing.


I had not used Claude much until an hour ago since probably before GPT5. I had only been using Gemini the last 3 months.

Sonnet 4.6 extended on the free plan is just incredible. I am just complete floored by it. The conversation I just had with it was nuts. It was from Dario mentioning something like a 20% chance Claude is conscious or something crazy like that. I have always tried that conversation with previous models but it got boring so fast.

There is something with the way it can organize context without getting lost that completely blows Gemini away.

Maybe even more so that it was the first time it felt like a model pushed back a little and the answers were not just me ultimately steering it into certain answers. For the free plan that is nuts.

In terms of being conscious, it is the first time I would say I am not 100% certain it is just a very useful, very smart , stochastic parrot. I wouldn't want to say more than that but 15-20% doesn't sound so insane to me as it did 2 hours ago.


> Or, somehow, against all probability and plausibility, are we all still early?

What does this even mean? It's obvious we're still early and I think it's a very common opinion.


I always grew up hearing “competition is good for the consumer.” But I never really internalized how good fierce battles for market share are. The amount of competition in a space is directly proportional to how good the results are for consumers.

Remember when GPT-2 was “too dangerous to release” in 2019? That could have still been the state in 2026 if they didn’t YOLO it and ship ChatGPT to kick off this whole race.

I was just thinking earlier today how in an alternate universe, probably not too far removed from our own, Google has a monopoly on transformers and we are all stuck with a single GPT-3.5 level model, and Google has a GPT-4o model behind the scenes that it is terrified to release (but using heavily internally).

This was basically almost real.

Before ChatGPT was even released, Google had an internal-only chat tuned LLM. It went "viral" because some of the testers thought it was sentient and it caused a whole media circus. This is partially why Google was so ill equipped to even start competing - they had fresh wounds of a crazy media circus.

My pet theory though is that this news is what inspired OpenAI to chat-tune GPT-3, which was a pretty cool text generator model, but not a chat model. So it may have been a necessary step to get chat-llms out of Mountain View and into the real world.

https://www.scientificamerican.com/article/google-engineer-c...

https://www.theguardian.com/technology/2022/jul/23/google-fi...


> some of the testers thought it was sentient and it caused a whole media circus.

Not "some of the testers." One engineer.

He realized he could get a lot of attention by claiming (with no evidence and no understanding of what sentience means) that the LLM was sentient and made a huge stink about it.


He was unfairly labelled as a lunatic early on. I'd implore anyone reading this thread to see what he had to say for yourself and form your own opinion: https://youtube.com/watch?v=kgCUn4fQTsc

He had a history of causing noise at Google’s weekly leadership Q&A.

Now think about how often the patent system has stifled and stalled and delayed advancement for decades per innovation at a time.

Where would we be if patents never existed?


Who knows? If we’d never moved on from trade secrets to patents, we might be a hundred years behind.

Is that really the case in the last few years/decades?

My understanding is that any company that can (read: has enough money for good lawyers), will prefer to use trade secrets for a combination of reasons, a big one being that competitors cannot use that technology after 10 years/when the patent expires.

Admittedly this was from my entrepreneurship classes in a European uni, so I'm not sure how it is in different places in the world.


Patents in the US are 20 years. Given how short sighted modern companies are, I can’t imagine anyone at any large company is even planning for something 20 years in the future, much less placing much value in an outcome that far out.

To be fair, Google has a patent on the transformer architecture. Their page rank patent monopoly probably helped fund the R&D.

They also had a patent on map/reduce.

It would have been nice for me to be able to work a few more years and be able to retire

will your retirement be enjoyable if everyone else around you is struggling?

What does that mean? Everyone was going to struggle because I still had my 9 to 5 middle class job?

They didn't YOLO ChatGPT. There were more than a few iterations of GPT-3 over a few years which were actually overmoderated, then they released a research preview named ChatGPT (that was barely functional compared to modern standards) that got traction outside the tech community because it was free, and so the pivot ensued.

I also remember when the playstation 2 required an export control license because it's 1GFLOP of compute was considered dangerous

that was also brilliant marketing


In 2019 the technology was new and there was no 'counter' at that time. The average persons was not thinking about the presence and prevalence of ai in the way we do now.

It was kinda like a having muskets against indigenous tribes in the 14-1500s vs a machine gun against a modern city today. The machine gun is objectively better but has not kept up pace with the increase in defensive capability of a modern city with a modern police force.


That's rewriting history. What they said at the time:

> Nearly a year ago we wrote in the OpenAI Charter : “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” and we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time. This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas. -- https://openai.com/index/better-language-models/

Then over the next few months they released increasingly large models, with the full model public in November 2019 https://openai.com/index/gpt-2-1-5b-release/ , well before ChatGPT.


They said:

> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT‑2 along with sampling code (opens in a new window).

"Too dangerous to release" is accurate. There's no rewriting of history.


Well, and it's being used to generate deceptive, biased, or abusive language at scale. But they're not concerned anymore.

They've decided that the money they'll make is too important, who cares about externalities...

It's quite depressing.


Link?

Link for what?

> Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.

I wouldn't call it rewriting history to say they initially considered GPT-2 too dangerous to be released. If they'd applied this approach to subsequent models rather than making them available via ChatGPT and an API, it's conceivable that LLMs would be 3-5 years behind where they currently are in the development cycle.


I think the diffusion model race would've kicked off anyway. Didn't it even start before ChatGPT was released?

I think the spark would've been lit either way.

It's kind of funny how both of these things kicked off within a few months.


Yeah, and Jurassic Park wouldn't have been a movie if they decided against breeding the dinosaurs.

Competition is great, but it's so much better when it is all about shaving costs. I am afraid that what we are seeing here is an arms race with no moat: Something that will behave a lot like a Vickrey auction. The competitors all lose money in the investment, and since a winner takes all, and it never makes sense to stop the marginal investment when you think you have a chance to win, ultimately more resources are spent than the value ever created.

This might not be what we are facing here, but seeing how little moat anyone on AI has, I just can't discount the risk. And then instead of the consumers of today getting a great deal, we zoom out and see that 5x was spent developing the tech than it needed to, and that's not all that great economically as a whole. It's not as if, say, the weights from a 3 year old model are just useful capital to be reused later, like, say, when in the dot com boom we ended up with way too much fiber that was needed, but that could be bought and turned on profitably later.


Three-year-old models aren't useful because there are (1) cheaper models that are roughly equivalent, and (2) better models.

If Sonnet 4.6 is actually "good enough" in some respects, maybe the models will just get cheaper along one branch, while they get better on a different branch.


It's funny, it sure seems like software projects in general follow the Lindy effect: considering their age and mindshare, I can safely predict gcc, emacs, SQLite, and Python will still be running somewhere ten, 20, 30 years from now. Indeed, people will choose to use certain software specifically because it's been around forever; it's tried and true.

But LLMs, and AI-related tooling, seem to really buck that trend: they're obsoleted almost as soon as they're released.


AI-related tooling is pretty fungible, but AI models get immediately obseleted due to the unit economics around training models... as well as the fact that nobody releases their datasets or training paradigms in useful detail (best we get is the model weights, because of copyright etc etc)

We saw that for PC's in the 80's because performance was advancing rapidly. It slowed down somewhat as computers became good enough.

People are rapidly learning how to improve model capabilities and lower resource requirements. The models we throw away as we go are the steps we climbed along the way.

The real interesting part is how often you see people on HN deny this. People have been saying the token cost will 10x, or AI companies are intentionally making their models worse to trick you to consume more tokens. As if making a better model isn't not the most cutting-throat competition (probably the most competitive market in the human history) right now.

I mean enshittification has not begun quite yet. Everyone is still raising capital so current investors can pass the bag to the next set. Soon as the money runs out monetization will overtake valuation as top priority. Then suddenly when you ask any of these models “how do I make chocolate chip cookies?” you will get something like:

> You will need one cup King Arthur All Purpose white flour, one large brown Eggland’s Best egg (a good source of Omega-3 and healthy cholesterol), one cup of water (be sure to use your Pyrex brand measuring cup), half a cup of Toll House Milk Chocolate Chips…

> Combine the sugar and egg in your 3 quart KitchenAid Mixer and mix until…

All of this will contain links and AdSense looking ads. For $200/month they will limit it to in-house ads about their $500/month model.


While this is funny, the actual race already started in how companies can nudge LLM results towards their products. We can't be saved from enshittification, I fear.

I am excited about a future where I am constantly reminded to like and subscribe my LLM’s output.

I'm concerned for a future where adults stop realizing they themselves sound like LLMs because the majority of their interaction/reading is output from LLMs. Decades of corporations being the ones molding the very language we use is going to have an interesting effect.

Only until the music stops. Racing to give away the most stuff for free can only last so long. Eventually you run out of other people’s money.

Uber managed to make it work for quite a while

They did, but Uber is no longer cheap [1]. Is the parent’s point that it can’t last forever? For Uber it lasted long enough to drive most of the competition away.

[1] https://www.theguardian.com/technology/2025/jun/25/second-st...


Uber's in a business where you have some amount of network effect - you need both drivers available using your app, as well as customers hailing rides. Without a sufficient quantity of either, you can't really turn a profit.

LLM providers don't, really. As far as I can tell, their moat is the ability to train a model, and possessing the hardware to run it. Also, open-weight models provide a floor for model training. I think their big bet is that gathering user-data from interactions with the LLM will be so valuable that it results in substantially-better models, but I'm not sure that's the case.


Uber's genius was getting their workers (sorry, 'contractors') to carry the capital costs of providing the fleet of vehicles they use.

Their other genius was to operate illegally, make the service so popular that politicians had no choice but to change the laws, and in the process make taxi licences, that used to cost as much as a house, worthless.

Unfortunately, people naively assume all markets behave like this, even when the market, in reality, is not set up for full competition (due to monopolies, monopsonies, informational asymmetry, etc).

And AI is currently killing a bunch of markets intentionally: the RAM deal for OpenAI wouldn't have gone through the way it did if it wasn't done in secret with anti-competitive restrictions.

There's a world of difference between what's happening and RAM prices if OAI and others were just bidding for produced modules as they released.


This is a bit of a tangent, but it highlights exactly what people miss when talking about China taking over our industries. Right now, China has about 140 different car brands, roughly 100 of which are domestic. Compare that to Europe, where we have about 50 brands competing, or the US, which is essentially a walled garden with fewer than 40.

That level of internal fierce competition is a massive reason why they are beating us so badly on cost-effectiveness and innovation.


It's the low cost of labor in addition to lack of environmental regulation that made China a success story. I'm sure the competition helps too but it's not main driver

oh, then explain to me how both China is leading in both robotics and AI. if it is because of "low cost of labor in addition to lack of environmental regulation", you'd be seeing countries like india beating the US and EU.

That happens in most of the world. Why China, then?

Because they have a billion and a half people and they were willing to be the western world’s factory.

Consequence is they are now facing an issue of “cancer villages” where the soil and water are unbelievably poisonous in many places.

which isnt particularly unique. its comparable to something like aome subset of americans getting black lung, or the health problems from the train explosion in east palestine.

it took a lot of work for environmentalists to get some regulation into the US, canda, and the EU. china will get to that eventually


It isn’t. I just bring it up to state there is a very good reason the rest of the world doesn’t just drop their regulations. In the future I imagine China may give up many of these industries and move to cleaner ones, letting someone else take the toxic manufacturing.

Until 2 remain, then it's extraction time.

Or self host the oss models on the second hand GPU and RAM that's left when the big labs implode

China will stop releasing open weights models as soon as they get within striking range; c.f. seedance 2.0.

ByteDance never really open sourced their models though. But I agree, they will only open source when it doesn't really matter.

> how good the results are for consumers.

Only if you take consummer electronics out of the equation, because this AI arm race has wrecked havoc in the market for consumer GPUs, RAM, SSD and HDD.

If you take the arm race externalities into account, I'm very much unconvinced that we're better off than last year.


I grew up with every service enshitified in the end. Whoever has more money wins the race and gets richer, that's free market for ya.

At a certain point though we can't only blame the free market or the companies. Consumers should know better than to choose products that are anti-consumer. The fact that they don't know better and don't care is the bigger problem. Until we figure out what to do about that any solution is going to be dangerously paternalistic.

Both Opus 4.6 and GPT-5.3 one shot a Gameboy emulator for me. Guess I need a better benchmark.


Is such an emulator not part of their training data sets?


As coding agents get "good enough" the next differentiator will be which one can complete a task in fewer tokens.


Or quicker, or more comprehensively for the same price.


Or the same number of tokens in less time. Kinda feels like the CPU / modem wars of the 90s all over again - I remember those differences you felt going from a 386 -> 486 or from a 2400 -> 9600 baud modem.

We're in the 2400 baud era for coding agents and I for one look forward to the 56k era around the corner ;)


There's hundreds of gameboy emulators available on Github they've been trained on. It's quite literally the simplest piece of emulation you could do. The fact that they couldn't do it before is an indictment of how shit they were, but a gameboy emulator should be a weekend project for anyone even ever so slightly qualified. Your benchmark was awful to begin with.


Your expectations are wild. Most software engineers could not write a game boy emulator - and now you need zero programming skills whatsoever to write one.


Ctrl-C + Ctrl-V. There. Done!

"a gameboy emulator should be a weekend project for anyone even ever so slightly qualified" do you really believe something so ridiculous?


Both Opus 4.6 and GPT-5.3 one shot a Gameboy emulator for me. Guess I need a better benchmark.


How does that work? Does it actually generate low level code? Or does it just import libraries that do the real work?


I just one shot a Gameboy emulator by going to Github and cloning one of the 100 I can find.


I'm not sure I agree. LLM's have the feel of an alien new technology, and especially did back then. In retrospect, it feels very obvious that small models don't pose much of a threat, but that's only in retrospect.


Nobody else should be allowed to build these ... while we build another model that is 10x more capable than the one that was a threat to humanity.

Sign up today to use it, just $10 a month.


> the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.

Claude Code is $20 a month, and I get a lot of usage out of it. I don't see how cutting edge AI tools are only for the rich. The name OpenAI is often mocked, but they did succeed at bringing the cutting edge of AI to everyone, time and time again.


Oh they will totally rent you their privilege, further enrichening themselves. Of course!


My cofounder said his plan is $100 a month and if it were $1,000 he’d still pay it.

So much of programming is tedium.


Cofounder of what?


I'm sure Apple is more than happy to pay the premium for cleanness.


Maybe they'd prefer it for aesthetics, but OTOH in iOS 18.2+ they support off-device ChatGPT and apparently refer to it as "ChatGPT" both in settings and when prompting the user to ask if they want to use it.

If they do refer to it as "Gemini" then this is a huge win for Google, and huge loss for OpenAI, since it really seems that the "ChatGPT" brand is the only real "moat" that OpenAI have, although recently there has been about a 20% shift in traffic from ChatGPT to Gemini, so the moat already seems to be running dry.


Today I showed Claude Code how to control my lights, and I'm having a blast.


Claude Code is absurdly good at setting up and configuring Home Assistant.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: