We dropped Claude. It's pretty clear this is a race to the bottom, and we don't want a hard dependency on another multi-billion dollar company just to write software
We'll be keeping an eye on open models (of which we already make good use of). I think that's the way forward. Actually it would be great if everybody would put more focus on open models, perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs: something we all can benefit from while it not being monopolized by a single billionarie company. Wouldn't it be nice if we don't need to pay for tokens? Paying for infra (servers, electricity) is already expensive enough
>we don't want a hard dependency on another multi-billion dollar company just to write software
One of two main reasons why I'm wary of LLMs. The other is fear of skill atrophy. These two problems compound. Skill atrophy is less bad if the replacement for the previous skill does not depend on a potentially less-than-friendly party.
I was worried about skill atrophy. I recently started a new job, and from day 1 I've been using Claude. 90+% of the code I've written has been with Claude. One of the earlier tickets I was given was to update the documentation for one of our pipelines. I used Claude entirely, starting with having it generate a very long and thorough document, then opening up new contexts and getting it to fact check until it stopped finding issues, and then having it cut out anything that was granular/one query away. And then I read what it had produced.
It was an experiment to see if I could enter a mature codebase I had zero knowledge of, look at it entirely through an AI, and come to understand it.
And it worked! Even though I've only worked on the codebase through Claude, whenever I pick up a ticket nowadays I know what file I'll be editing and how it relates to the rest of the code. If anything, I have a significantly better understanding of the codebase than I would without AI at this point in my onboarding.
I have never learned so quickly in my entire life than to post a forum thread in its entirety into a extended think LLM and then be allowed to ask free form questions for 2 hours straight if I want to. Having my questions answered NOW is so important for me to learn. Back in the day by the time I found the answer online I forgot the question
Same. I work in the film industry, but I’ve always been interested in computers and have enjoyed tinkering with them since I was about 5. However, coding has always been this insurmountably complicated thing- every time I make an effort to learn, I’m confronted with concepts that are difficult for me to understand and process.
I’ve been 90% vibe coding for a year or so now, and I’ve learned so much about networking just from spinning up a bunch of docker containers and helping GPT or Claude fix niggling issues.
I essentially have an expert (well, maybe not an expert but an entity far more capable than I am on my own) who’s shoulder I can look over and ask as many questions I want to, and who will explain every step of the process to me if I want.
I’m finally able to create things on my computer that I’ve been dreaming about for years.
Some people talk like skill atrophy is inevitable when you use LLMs, which strikes me as pretty absurd given that you are talking about a tool that will answer an infinite number of questions with infinite patience.
I usually learn way more by having Claude do a task and then quizzing it about what it did than by figuring out how to do it myself. When I have to figure out how to do the thing, it takes much more time, so when I'm done I have to move on immediately. When Claude does the task in ten minutes I now have several hours I can dedicate entirely to understanding.
You lose some, you win some. The win could be short-term much higher, however imagine that the new tool suddenly gets ragged pulled from under your feet. What do you do then? Do you still know how to handle it the old way or do you run into skill atrophy issues? I’m using Claude/Codex as well, but I’m a little worried that the environment we work in will become a lot more bumpy and shifty.
> however imagine that the new tool suddenly gets ragged pulled from under your feet
When you have a headache, do you avoid taking ibuprofen because one day it may not be available anymore? Two hundred years ago, if you gave someone ibuprofen and told them it was the solution for 99% of the cases where they felt some kind of pain, they might be suspicious. Surely that's too good to be true.
But it's not. Ibuprofen really is a free lunch, and so is AI. It's weird to experience, but these kinds of technologies come around pretty often, they just become ubiquitous so quickly that we forget how we got by without them.
The "infinite patience" thing I find particularly interesting.
Every now and then I pause before I ask an LLM to undo something it just did or answer something I know it answered already, somewhere. And then I remember oh yeah, it's an LLM, it's not going to get upset.
I used to speak Russian like I was born in Russia. I stopped talking Russian … every day I am curious ans responsible but I can hardly say 10 words in Russian today. if you don’t use it (not just be curious and responsible) you will lose it - period.
Programming language is not just syntax, keywords and standard libraries, but also: processes, best practices and design principles. The latter group I guess is more difficult to learn and harder to forget.
I respectfully completely disagree. not only will you just as easily lose thr processed, best practices and design principles but they will be changing over time (what was best practice when I got my first gig in 1997 is not a best practice today (even just 4-5 years ago not to go all the back to the 90’s)). all that is super easy to both forget and lose unless you live it daily
More fair comparison would be writing/talking about Russian language in English. That way you'd still focus on Russian. Same way with programming - it's not like you stop seeing any code. So why should you forget it?
Are you sure you would know if it didn't work? I use Claude extensively myself, so I'm not saying this from a "hater" angle, but I had 2 people last week who believe themselves to be in your shoes send me pull requests which made absolutely no sense in the context of the codebase.
No, it hasn't. I did not have a problem before AI with people sending in gigantic pull requests that made absolutely no sense, and justifying them with generated responses that they clearly did not understand. This is not a thing that used to happen. That's not to say people wouldn't have done it if it were possible, but there was a barrier to submitting a pull request that no longer exists.
It's good that it's working for you but I'm not sure what this has to do with skill atrophy. It sounds like you never had this skill (in this case, working with that particular system) to begin with.
>I have a significantly better understanding of the codebase than I would without AI at this point in my onboarding
One of the pitfalls of using AI to learn is the same as I'd see students doing pre-AI with tutoring services. They'd have tutors explain the homework to do them and even work through the problems with them. Thing is, any time you see a problem or concept solved, your brain is tricked into thinking you understand the topic enough to do it yourself. It's why people think their job interview questions are much easier than they really are; things just seem obvious when you've thought about the solution. Anyone who's read a tutorial, felt like they understood it well, and then struggled for a while to actually start using the tool to make something new knows the feeling very well. That Todo List app in the tutorial seemed so simple, but the author was making a bunch of decisions constantly that you didn't have to think about as you read it.
So I guess my question would be: If you were on a plane flight with no wifi, and you wanted to do some dev work locally on your laptop, how comfortable would you be vs if you had done all that work yourself rather than via Claude?
> If you were on a plane flight with no wifi, and you wanted to do some dev work locally on your laptop, how comfortable would you be vs if you had done all that work yourself rather than via Claude?
Probably about as comfortable as I would be if I also didn't have my laptop and instead had to sketch out the codebase in a notebook. There's no sense preparing for a scenario where AI isn't available - local models are progressing so quickly that some kind of AI is always going to be available.
So then the argument isn't so much that skill decay isn't an issue but rather that the skill is inherently worthless moving forward. I'm not sure I agree, but I also got a compsci education because I have loved doing it since childhood rather than because I just wanted to make money, and I can see how the latter group would vehemently disagree with me.
For example, Claude was very eager to include function names, implementation details, and the exact variables that are passed between services. But all the info I need for a particular process is the names of the services involved, the files involved, and a one-sentence summary of what happens. If I want to know more, I can tell Claude to read the doc and find out more with a single query (or I can just check for myself).
I've worked with people who will look at code they don't understand, say "llm says this", and express zero intention of learning something. Might even push back. Be proud of their ignorance.
It's like, why even review that PR in the first place if you don't even know what you're working with?
I cringed when I saw a dev literally copy and paste an AI's response to a concern. The concern was one that had layers and implications to it, but instead of getting an answer as to why it was done a certain way and to allay any potential issues, that dev got a two paragraph lecture on how something worked on the surface of it, wrapped in em dashes and joviality.
A good dev would've read deeper into the concern and maybe noticed potential flaws, and if he had his own doubts about what the concern was about, would have asked for more clarification. Not just feed a concern into AI and fling it back. Like please, in this day and age of AI, have the benefit of the doubt that someone with a concern would have checked with AI himself if he had any doubts of his own concern...
It's a new problem in the sense that now executive management at many (if not most) software companies is pushing for all employees to work this way as much as possible. Those same people probably don't know what stack overflow even is.
In my experience, no - I think the ability to build more complete features with less/little/no effort, rather than isolated functions, is (more) appealing to (more) developers.
I don't think so. I'll spend a ton of time and effort thinking through, revising, and planning out the approach, but I let the agent take the wheel when it comes to transpiling that to code. I don't actually care about the code so long as it's secure and works.
I spent years cultivating expertise in C++ and .NET. And I found that time both valuable and enjoyable. But that's because it was a path to solve problems for my team, give guidance, and do so with both breadth and depth.
Now I focus on problems at a higher level of abstraction. I am certain there's still value in understanding ownership semantics and using reflection effectively, but they're broadly less relevant concerns.
These people have always existed. Hell, they are here, too. Now they have a new thing to delegate responsibility to.
And no, I don't understand them at all. Taking responsibility for something, improving it, and stewarding it into production is a fantastic feeling, and much better than reading the comment section. :)
I kind feel the same. I’m learning things and doing things in areas that would just skip due to lack of time or fear.
But I’m so much more detached of the code, I don’t feel that ‘deep neural connection’ from actual spending days in locked in a refactor or debugging a really complex issue.
I strongly agree on the refactor, but for debugging I have another perspective: I think debugging is changing for the better, so it looks different.
Sure, you don't know the code by heart, but people debugging code translated to assembly already do that.
The big difference is being able to unleash scripts that invalidate enormous amount of hypothesis very fast and that can analyze the data.
Used to do that by hand it took hours, so it would be a last resort approach. Now that's very cheap, so validating many hypothesis is way cheaper!
I feel like my "debugging ability" in terms of value delivered has gone way up. For skill, it's changing. I cannot tell, but the value i am delivering for debugging sessions has gone way up
As someone who's switched from mobile to web dev professionally for the last 6 months now. If you care about code quality, you'll develop that neural connection after some time.
But if you don't and there's no PR process (side projects), the motivation to form that connection is quite low.
> If you care about code quality, you'll develop that neural connection after some time.
No, because you can get LLMs to produce high quality code that has gone through an infinite number of refinement/polish cycles and is far more exhaustive than the code you would have written yourself.
Once you hit that point, you find yourself in a directional/steering position divorced from the code since no matter what direction you take, you'll get high quality code.
Yes, you certainly can argue that, but you'd be wrong. The primary selling point of LLMs is that they solve the problem of needing skill to get things done.
I suggest you read the sales pitches that these products have been making. Again, when I say that this is the selling point, I mean it: This is why management is buying them.
I've read the sales pitches, and they're not about replacing the need for skill. The Claude Design announcement from yesterday (https://www.anthropic.com/news/claude-design-anthropic-labs) is pretty typical in my experience. The pitch is that this is good for designers, because it will allow them to explore a much broader range of ideas and collaborate on them with counterparties more easily. The tool will give you cool little sliders to set the city size and arc width, but it doesn't explain why you would want to adjust these parameters or how to determine the correct values; that's your job.
I understand why a designer might read this post and not be happy about it. If you don't think your management values or appreciates design skill, you'd worry they're going to glaze over the bullet points about design productivity, and jump straight to the one where PMs and marketers can build prototypes and ignore you. But that's not what the sales pitch is focused on.
The majority of examples in the document you linked describe 'person without<skill> can do thing needing <skill>'. It's very much selling 'more output, less skill'
They purportedly solve the problem of needing skill to get things done. IME, this is usually repeated by VC backed LLM companies or people who haven’t knowingly had to deal with other people’s bad results.
This all bumps up against the fact that most people default to “you use the tool wrong” and/or “you should only use it to do things where you already have firm grasp or at least foundational knowledge.”
It also bumps against the fact that the average person is using LLM’s as a replacement for standard google search.
I see it completely the opposite way, you use an LLM and correct all its mistakes and it allows you to deliver a rough solution very quickly and then refine it in combination with the AI but it still gets completely lost and stuck on basic things. It’s a very useful companion that you can’t trust, but it’s made me 4-5x more productive and certainly less frustrated by the legacy codebase I work on.
Yeah I whole hardheartedly disagree with this. Because I understand the basics of coding I can understand where the model gets stuck and prompt it in other directions.
If you don't know whats going on through the whole process, good luck with the end product.
You're learning at your standard rate of learning, you're just feeding yourself over-confidence on how much you're absorbing vs what the LLM is facilitating you rolling out.
The latent assumption here is that learning is zero sum.
That you can take a 30 year old from 1856 bring them into present day and they will learn whatever subject as fast as a present day 20 year old.
That teachers doesn't matter.
That engagement doesn't matter.
Learning is not zero sum. Some cultural background makes learning easier, some mentoring makes is easier, and some techniques increases engagement in ways that increase learning speed.
The challenge is not if you could do all of it without AI but any of it that you couldn't before.
Not everyone learns at the same pace and not everyone has the same fault tolerance threshold. In my experiencd some people are what I call "Japanese learners" perfecting by watching. They will learn with AI but would never do it themselves out of fear of getting something wrong while they understand most of it, others that I call "western learners" will start right away and "get their hands dirty" without much knowledge and also get it wrong right away. Both are valid learning strategies fitting different personalities.
If your child says they've learned their multiplication tables but they can't actually multiply any numbers you give them do they actually know how to do multiplication? I would say no.
For some reason people are perfectly able to understand this in the context of, say, cursive, calculator use, etc., but when it comes to their own skillset somehow it's going to be really different.
It’s quite possible to be deep into solving a problem with an LLM guiding you where you’re reading and learning from what it says. This is not really that different from googling random blogs and learning from Stack Overflow.
Assuming everyone just sits there dribbling whilst Claude is in YOLO mode isn’t always correct.
>> I am learning a new skill with instructor at an incredible rate
> Could you do it again on your own?
Can you you see how nonsensical your stance is? You're straight up accusing GP of lying they are learning something at the increased rate OR suggesting if they couldn't learn that, presumably at the same rate, on they own, they're not learning anything.
That's not very wise to project your own experiences on others.
Actually, it’s much like taking a physics or engineering course, and after the class being fully able to explain the class that day, and yet realize later when you are doing the homework that you did not actually fully understand like you thought you did.
I would argue that if you've just watched videos about building computers and haven't sat down and done one yourself, then yeah I don't see any evidence that you've learned how to build a computer.
No, it is an as snarky response to a person being snarky about usefulness of AI agents.
It does seem like there is a cult of people who categorically see LLMs as being poor at anything without it being founded in anything experience other than their 2023 afternoon to play around with it.
Who cares? Why are people so invested in trying to “convert” others to see the light?
Can’t you be satisfied with outcompeting “non believers”? What motivates you to argue on the internet about it? Deep down are you insecure about your reliance on these tools or something, and want everyone else to be as well?
It's partly that, but also reading and surface level understanding something vs generating yourself are different skills with different depths. If you're learning a language, you can get good at listening without getting good at speaking for example.
Yeah I am worried about skill atrophy too. Everyone uses a compiler these days instead of writing assembly. Like who the heck is going to do all the work when people forget how to use the low level tools and a compiler has a bug or something?
And don’t get me started on memory management. Nobody even knows how to use malloc(), let alone brk()/mmap(). Everything is relying on automatic memory management.
I mean when was the last time you actually used your magnetized needle? I know I am pretty rusty with mine.
Snark aside, this is an actual problem for a lot of developers in varying degrees, not understand anything about the layers below make for terrible layers above in very many situations.
Another aspect I haven’t seen discussed too much is that if your competitor is 10x more productive with AI, and to stay relevant you also use AI and become 10x more productive. Does the business actually grow enough to justify the extra expense? Or are you pretty much in the same state as you were without AI, but you are both paying an AI tax to stay relevant?
This is the “ad tax” reasoning, but ultimately I think the answer is greater efficiency. So there is a real value, even if all competitors use the tools.
It’s like saying clothing manufacturers are paying the “loom tax” tax when they could have been weaving by hand…
Software development is not a production line, the relationship between code output and revenue is extremely non-linear.
Where producing 2x the t-shirts will get you ~2x the revenue, it's quite unlikely that 10x the code will get you even close to 2x revenue.
With how much of this industry operates on 'Vendor Lock-in' there's a very real chance the multiplier ends up 0x. AI doesn't add anything when you can already 10x the prices on the grounds of "Fuck you. What are you gonna do about it?"
Yep and in a vendor lock in scenario, fixing deep bugs or making additions in surgical ways is where the value is. And Claude helps you do that, by giving you more information, analyzing options, but it doesn’t let you make that decision 10x faster.
We already know how to multiply the efficiency of human intelligence to produce better quality than LLMs and nearly match their productivity - open source - in fact coding LLMs wouldn't even exist without it.
Open source libraries and projects together with open source AI is the only way to avoid the existential risks of closed source AI.
Where's the evidence of competitors being 10x more productive? So far, everyone is simply bragging about how much code they have shipped last week, but that has zero relevance when it comes to productivity
I work at a 20-year-old mid-sized SaaS company. As long as the company has been around, product managers have longed for more engineers and strategies for engineers to ship features faster. As of around February, those same product managers across the org are complaining that they can't keep up with the pace at which engineers are shipping their features. This isn't just lines of code. This is the entire company trying to figure out how to help the PMs because engineers suddenly stopped being the bottleneck.
I don't know about 10x, but this could only happen if PMs suddenly got really lazy or the engineers actually got at least 1.5x faster. My gut says it's way more because we're now also consistently up to date on our dependencies and completing massive refactors we were putting off for years.
There are lots of reasons this could be the case. Quality suddenly changed, the nature of the work changed, engineers leveled up... But for this to have happened consistently across a bunch of engineering teams is quite the coincidence if not this one thing we are all talking about.
Read it as just a given rate. The number doesn’t matter too much here, if company B does believe claims from company A they are N times more productive that’s enough to force B to adopt the same tooling.
I feel like a lot of the AI advocacy today is like the Cloud advocacy of a few years ago or the Agile advocacy before that. It's this season's silver bullet to make us all 10x more effective according to metrics that somehow never translate into adding actually useful functionality and quality 10x as fast.
The evangelists told us 20 years ago that if we weren't doing TDD then we weren't really professional programmers at all. The evangelists told us 10 years ago that if we were still running stuff locally then we must be paying a fortune for IT admin or not spending our time on the work that mattered. The evangelists this week tell us that we need to be using agents to write all our code or we'll get left in the dust by our competitors who are.
I'm still waiting for my flying car. Would settle for some graphics software on Linux that matches the state of the art on Windows or even reliable high-quality video calls and online chat rooms that don't make continental drift look fast.
The alternative is probably also true. If your F500 competitor is also handicapped by AI somehow, then you're all stagnant, maybe at different levels. Meanwhile Anthropic is scooping up software engineers it supposedly made irrelevant with Mythos and moving into literally 2+ new categories per quarter
Either the business grows, or the market participants shed human headcount to find the optimal profit margin. Isn’t that the great unknown: what professions are going to see headcount reduction because demand can’t grow that fast (like we’ve seen in agriculture), and which will actually see headcount stay the same or even expand, because the market has enough demand to keep up with the productivity gains of AI? Increasingly I think software writ large is the latter, but individual segments in software probably are the former.
The cost is so small relative to the increase. The cost whining on HN is bizarre to me. Feels like everyone here is on an individual plan and has no understanding of what margins look like for actual business.
Meta pays $750k+ TC and makes far more profit/eng, do you think they care about $5k/eng/mo in inference? A 1.1x increase would be so significant that it would justify the cost easily, especially when you can just compress comps to make up for it
What? You don't think businesses do financial planning and calculations for profit margins?
Do you really think they go on vibes - "welp, this AI thing seems to improve developer performance, I guess. Heck, what's an extra 5k per developer anyways, amirite".
Well, maybe they really do in your neck of the woods. Explains a lot, I guess.
Yes most companies do in fact operate like this. There are tens of thousands of companies that will pay more for the best thing and call it at that, because the cost is dwarfed by what even marginal gains in quality unlock for the business.
Open models keep closing the eval gap for many tasks, and local inference continues to be increasingly viable. What's missing isn't technical capability, but productized convenience that makes the API path feel like the only realistic option.
Frontier labs are incentivized to keep it that way, and they're investing billions to make AI = API the default. But that's a business model, not a technical inevitability.
im hoping and praying that local inference finds it's way to some sort of baseline that we're all depending on claude for here. that would help shape hardware designs on personal devices probably something in the direction of what apple has been doing.
ive had to like tune out of the LLM scene because it's just a huge mess. It feels impossible to actually get benchmarks, it's insanely hard to get a grasp on what everyone is talking about, bots galore championing whatever model, it's just way too much craze and hype and misinformation. what I do know is we can't keep draining lakes with datacenters here and letting companies that are willing to heel turn on a whim basically control the output of all companies. that's not going to work, we collectively have to find a way to make local inference the path forward.
everyone's foot is on the gas. all orgs, all execs, all peoples working jobs. there's no putting this stuff down, and it's exhausting but we have to be using claude like _right now_. pretty much every company is already completely locked in to openai/gemini/claude and for some unfortunate ones copilot. this was a utility vendor lock in capture that happened faster than anything ive ever seen in my life & I already am desperate for a way to get my org out of this.
I'm frustrated that there's not "solid" instructional tooling. I either see people just saying "keep trying different prompts and switching models until you get lucky" or building huge cantilevered toolchains that seems incredibly brittle, and even then, how well do they really work?
I get choice paralysis when you show me a prompt box-- I don't know what I can reasonably ask for and how to best phrase it, so I just panic. It doesn't help when we see articles saying people are getting better outcomes by adding things like "and no bugs plz owo"
I'm sure this is by design-- anything with clear boundaries and best practices would discourage gacha style experimentation. Can you trust anyone who sells you a metered service to give you good guidance on how to use it efficiently?
yea that is probably the worst part of these techs becoming mainstream services and local-LLM'ing taking off in general: working with them at many points in any architecture no longer feels... deterministic i guess. way too fucking much "heres what i use" but no real best practices yet, just a lot of vague gray area and everyones still in discovery-mode on how to best find some level of determinism or workflow and ways we are benchmarking is seriously a moving target. everyone has their own branded take on what the technology is and their own branded approach on how to use it, and it's probably the murkiest and foggiest time to be in technology fields that i've ever seen :\ seems like weekly/monthly something is outdated, not just the models but the tooling people are parroting as the current best tooling to use. incredibly frustrating. there's simply too much ground to cover for any one person to have any absolute takes on any of it, and because a handful of entities are currently leading the charge draining lakes and trying to compete for every person and every businesses money, there's zero organized frameworks at the top to make some sense of this. they all are banking on their secret sauce, and i _really_ want us all to get away from this. local inference has to succeed imo but goddamn there needs to be some collective working together to rally behind some common strats/frameworks here. im sure there's already countless committees that have been established to try and get in front of this but even that's messy.
i don't know how else to phrase it: this feels like such an unstable landscape, "beta" software/services are running rampant in every industry/company/org/etc and there's absolutely no single resource we can turn to to help stay ahead of & plan for the rapidly-evolving landscape. every, and i mean every company, is incredibly irresponsible for using this stuff. including my own. once again though, cat's already out of the bag. now we fight for our lives trying to contain it and ensure things are well understood and implemented properly...which seems to be the steepest uphill battle of my life
I'm hopeful that new efficiencies in training (Deepseek et al.), the impressive performance of smaller models enhanced through distillation, and a glut of past-their-prime-but-functioning GPUs all converge make good-enough open/libre models cheap, ubiquitous, and less resource-intensive to train and run.
> we don't want a hard dependency on another multi-billion dollar company just to write software
My manager doesn't even want us to use copilot locally. Now we are supposed to only use the GitHub copilot cloud agent. One shot from prompt to PR. With people like that selling vendor lock in for them these companies like GitHub, OpenAI, Anthropic etc don't even need sales and marketing departments!
Spent a lot of time with "open models." None of them come close. They are benchmaxxed. But you won't hear many of the open model fans on HN admit this.
The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?
No one working on a serious project at a serious company is downgrading their agent's intelligence for a marginal cost saving. Downgrading your model is like downgrading the toilet paper on your yacht.
> The open model mentality is also just so bizarre to me. You're going to use an inferior model to save, what, a couple hundred bucks a month? Is your time really worth that little?
I agree that people who claim that open models are as good as claude/openai/z are lying, delusional, or not doing very much. I've tried them all, included GLM 5.1.
GLM is not bad but the hardware needed will never recoup the ROI vs just using a commercial provider through its API.
That being said, you're being reductive here. For many use cases local models offer advantages that can't obtained through a commercial API : Privacy, ownership of the entire stack, predictability. They can't be rugpulled, they can't snitch on you. They will not give you 503.
Those advantages are very valuable for things like a local assistant, as an agent, for data extraction, for translations, for games (role playing and whatnot), etc.
That being said I know that many people are like you, they don't give a second thought about privacy. They'd plug Anthropic to their brain if they could. So I understand the sentiment. I just think that you should in turn try to understand why someone would use an open model.
I have it as failover to Opus 4.6 in a Claude proxy internally. People don't notice a thing when it triggers, maybe a failed tool call here and there (harness remains CC not OC) or a context window that has gone over 200k tokens or an image attachment that GLM does not handle, otherwise hunky-dory all the way. I would also use it as permanent replacement for haiku at this proxy to lower Claude costs but have not tried it yet. Opus 4.7 has shaken our setup badly and we might look into moving to Codex 100% (GLM could remain useful there too).
That's a lame attitude. There are local models that are last year's SOTA, but that's not good enough because this year's SOTA is even better yet still...
I've said it before and I'll say it again, local models are "there" in terms of true productive usage for complex coding tasks. Like, for real, there.
The issue right now is that buying the compute to run the top end local models is absurdly unaffordable. Both in general but also because you're outbidding LLM companies for limited hardware resources.
You have a $10K budget, you can legit run last year's SOTA agentic models locally and do hard things well. But most people don't or won't, nor does it make cost effective sense Vs. currently subsidized API costs.
I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can. I would love to be able to say I have X technique for compensating for the model shortfall, but my experience so far has been that bigger, later models out perform older, smaller ones. I genuinely hope this changes through. I understand the investment that it has taken to get us to this point, but intelligence doesn't seem like it's something that should be gated.
Right; but every major generation has had diminishing returns on the last. Two years ago the difference was HUGE between major releases, and now we're discussing Opus 4.6 Vs. 4.7 and people cannot seem to agree if it is an improvement or regression (and even their data in the card shows regressions).
So my point is: If you have the attitude that unless it is the bleeding edge, it may have well not exist, then local models are never going to be good enough. But truth is they're now well exceeding what they need to be to be huge productivity tools, and would have been bleeding edge fairly recently.
I feel like I'm going to have to try the next model. For a few cycles yet. My opinion is that Opus 4.7 is performing worse for my current work flow, but 4.6 was a significant step up, and I'd be getting worse results and shipping slower if I'd stuck with 4.5. The providers are always going to swear that the latest is the greatest. Demis Hassabis recently said in an interview that he thinks the better funded projects will continue to find significant gains through advanced techniques, but that open source models figure out what was changed after about 6 months or so. We'll see I guess. Don't get me wrong, I'd love to settle down with one model and I'd love it to be something I could self host for free.
> I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can.
Don't you understand that by choosing the best model we can, we are, collectively, step by step devaluating what our time is worth? Do you really think we all can keep our fancy paychecks while keep using AI?
Do you think if you or me stopped using AI that everyone else will too? We're still what we always were - problem solvers who have gained the ability to learn and understand systems better that the general population, communicate clearly (to humans and now AIs). Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever. As the amount of software grows, so will the need for people who know how to manage the complexity that comes with it.
> Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever.
There were always jobs that required those "many more skills" but didn't require any programming skills.
We call those people Business Analysts and you could have been doing it for decades now. You didn't, because those jobs paid half what a decent/average programmer made.
Now you are willingly jumping into that position without realising that the lag between your value (i.e. half your salary, or less) would eventually disappear.
I guess we will need to wait and see if AI can remove ALL of the complexity that requires a software engineer over a business analyst. I can't currently believe that it will. BA's I've worked with vary in technical capability from 'having coded before and understanding DB schema basics and network architecture' to 'I know how the business works but nothing about computers'. If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe. But while AI is a probabilistic technology manipulating deterministic systems, we will always need people to understand whats going on, and whether they write a lot of code or not, they will be engineers, not analysts. Whether it's more or less of those people, we will see.
> If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe.
They don't need to all run on the same frameworks, they just need to run on documented frameworks.
What possible value can you bring to a BA?
The system topology (say, if the backend was microservices vs Lambda vs something-else)? The LLM can explain to the BA what their options are, and the impact of those options.
The framework being used (Vue, or React, or something else)? The AI can directly twiddle that for the BA.
Solving a problem? If the observability is setup, the LLM can pinpoint almost all the problems too,and with a separate UAT or failover-type replica, can repro, edit, build, deploy and test faster than you can.
Like I already said, if[1] you're now able to build or enhance a system without actually needing programming skills, why are you excited about that? You could always do that. It's just that it pays half what programming skills gets you.
You (and many others who boast about not writing code since $DATE) appear to be willingly moving to a role that already pays less, and will pay even less once the candidates for that role double (because now all you programmers are shifting towards it).
It's supply and demand, that's all.
--------------
[1] That's a very big "If", I think. However, the programmers who are so glad to not program appear to believe that it's a very small "If", because they're the ones explaining just how far the capabilities have come in just a year, and expect the trend to continue. Of course, if the SOTA models never get better than what we have now, then, sure - your argument holds - you'll still provide value.
First, making sure to offer an upvote here. I happen to be VERY enthusiastic about local models, but I've found them to be incredibly hard to host, incredibly hard to harness, and, despite everything, remarkably powerful if you are willing to suffer really poor token/second performance...
>perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs
I fear that this may not be feasible in the long term. The open-model free ride is not guaranteed to continue forever; some labs offer them for free for publicity after receiving millions in VC grants now, but that's not a sustainable business model. Models cost millions/billions in infrastructure to train. It's not like open-source software where people can just volunteer their time for free; here we are talking about spending real money upfront, for something that will get obsolete in months.
Current AI model "production" is more akin to an industrial endeavor than open-source arrangements we saw in the past. Until we see some breakthrough, I'm bearish on "open models will eventually save us from reliance on big companies".
If you mean obsolete in the sense of "no longer fit for purpose" I don't think that's true. They may become obsolete in terms of "can't do hottest new thing" but that's true of pretty much any technology. A capable local model that can do X will always be able to do X, it just may not be able to do Y. But if X is good enough to solve your problem, why is a newer better model needed?
I think if we were able to achieve ~Opus 4.6 level quality in a local model that would probably be "good enough" for a vast number of tasks. I think it's debatable whether newer models are always better - 4.7 seems to be somewhat of a regression for example.
That's fine. When the "best of the best" is offered only by a couple of companies that are not looking into our best interests, then we can discard them
LMArena isn't very useful as a benchmark, however I can vouch for the fact that GLM 5.1 is astonishingly good. Several people I know who have a $100/mo Claude Code subscription are considering cancelling it and going all in on GLM, because it's finally gotten (for them) comparable to Opus 4.5/6. I don't use Opus myself, but I can definitely say that the jump from the (imvho) previous best open weight model Kimi K2.5 to this is otherworldly — and K2.5 was already a huge jump itself!
Mind you, a 30B model (3B active) is not going to be comparable to Opus. There are open models that are near-SOTA but they are ~750B-1T total params. That's going to require substantial infrastructure if you want to use them agentically, scaled up even further if you expect quick real-time response for at least some fraction of that work. (Your only hope of getting reasonable utilization out of local hardware in single-user or few-users scenarios is to always have something useful cranking in the background during downtime.)
For a business with ten or more engineers/people-using-ai, it might still make sense to set this up. For an individual though, I can’t imagine you’d make it through to positive ROI before the hardware ages out.
It's hard to tell for sure because the local inference engines/frameworks we have today are not really that capable. We have barely started exploring the implications of SSD offload, saving KV-caches to storage for reuse, setting up distributed inference in multi-GPU setups or over the network, making use of specialty hardware such as NPUs etc. All of these can reuse fairly ordinary, run-of-the-mill hardware.
I'm backing up a big dataset onto tapes, so I wanted to automate it. I have an idle 64Gb VRAM setup in my basement, so I decided to experiment and tasked it with writing an LTFS implementation. LTFS is an open standard for filesystems for tapes, and there's an implementation in C that can be used as the baseline.
So far, Qwen 3.6 created a functionally equivalent Golang implementation that works against the flat file backend within the last 2 days. I'm extremely impressed.
I want to bump this more than just a +1 by recommending everyone try out OpenCode. It can still run on a Codex subscription so you aren’t in fully unfamiliar territory but unlocks a lot of options.
The thing I dislike about OpenCode is the lack of capabilities of their editor, also, resource intensive, for some reason on a VM it chuckles each 30 mins, that I need to discard all sessions, commits, etc.
I don't know if it is bun related, but in task manager, is the thing that is almost at the top always on CPU usage, turns out for me, bun is not production ready at all.
Wish Zed editor had something like BigPickle which is free to use without limits.
Qwen’s 30B models run great on my MBP (M4, 48GB) but the issue I have is cooling - the fan exhaust is straight onto the screen, which I can’t help thinking will eventually degrade it, given the thermal cycling it would go through. A Mac Studio makes far more sense for local inference just for this reason alone.
I have 24GB VRAM available and haven't yet found a decent model or combination.
Last one I tried is Qwen with continue, I guess I need to spend more time on this.
im currently running a custom Gemma4 26b MoE model on my 24gb m2... super fast and it beat deepseek, chatgpt, and gemini in 3 different puzzles/code challenges I tested it on. the issue now is the low context... I can only do 2048 tokens with my vram... the gap is slowly closing on the frontier models
It's a MoE model so I'd assume a cheaper MBP would simply result in some experts staying on CPU? And those would still have a sizeable fraction of the unified memory bandwidth available.
I haven’t tried this myself yet but you would still need enough non-vram ram available to the cpu to offload to cpu, right? This is a fully novice question, I have not ever tried it.
You're correct. If you don't have enough RAM for the model, it can still run but most of it will run on the CPU and be continuously reloaded from the SSD (through mmap).
A medium MoE like 35B can still achieve usable speeds in that setup, mind you, depending on what you're doing.
I'm increasingly thinking the same as our spend on tokens goes up.
If you have HPC or Supercompute already, you have much of the expertise on staff already to expand models locally, and between Apple Silicon and Exo there are some amazingly solutions out there.
Now, if only the rumors about Exo expanding to Nvidia are true..
>perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs: something we all can benefit from while it not being monopolized by a single billionarie company
Training and inference costs so we would have to pay for them.
My understanding is that the major part of the cost of a given model is the training - so open models depend on the training that was done for frontier models? I'm finding hard to imagine (e.g.) RLHF being fundable through a free software type arrangement.
No, the training between proprietary and open models is completely different. The speculation that open models might be "distilled" from proprietary ones is just that, speculation, and a large portion of it is outright nonsense. It's physically possible to train on chat logs from another model but that's not "distilling" anything, and it's not even eliciting any real fraction of the other model's overall knowledge.
I don't know what to make of it, I am skeptical of OpenAI/Anthropic claims about distillation, but I did notice DeepSeek started sounding a lot like Claude recently.
This is part of the reason why I'm really worried that this is all going to result in a greater economic collapse than I think people are realizing.
I think companies that are shelling out the money for these enterprise accounts could honestly just buy some H100 GPUs and host the models themselves on premises. Github CoPilot enterprise charges $40 per user per month (this can vary depending on your plan of course), but at this price for 1000 users that comes out to $480,000 a year. Maybe I'm missing something, but that's roughly what you're going to be spending to get a full fledged hosting setup for LLMs.
Most companies don't want to host it themselves. They want someone to do it for them, and they are happy to pay for it. If it makes their lives easier and does not add complexity, then it has a lot of value.
Out of curiosity, how many concurrent users could you get with a hosting setup at that price? If let's say 10% of those 1000 users were using it at the same time would it handle it? What about 30% or 100%?
You made a good point that I didn't think through fully. It's the concurrent user aspect that heavily impacts things. Currently, you'd probably need quite a bit more investment to the point of having a mini data center to do what I'm proposing.
However, we've been seeing advancements in compressing context and capabilities of smaller models that I don't think it'd be too far off to see something like what I'm talking about within the next 5 years.
Yeah it seems so. Anthropic has entered the enshittification phase. They got people hooked onto their SOTAs so it's now time to keep releasing marginal performance increase models at 40% higher token price. The problem is that both Anthropic and OpenAI have no other income other than AI. Can't Google just drown them out with cheaper prices over the long run? It seems like an attrition battle to me.
> I think that's the way forward. Actually it would be great if everybody would put more focus on open models,
I'm still surprised top CS schools are not investing in having their students build models, I know some are, but like, when's the last time we talked about a model not made by some company, versus a model made by some college or university, which is maintained by the university and useful for all.
It's disgusting that OpenAI still calls itself "Open AI" when they aren't truly open.
We'll be keeping an eye on open models (of which we already make good use of). I think that's the way forward. Actually it would be great if everybody would put more focus on open models, perhaps we can come up with something like the "linux/postgres/git/http/etc" of the LLMs: something we all can benefit from while it not being monopolized by a single billionarie company. Wouldn't it be nice if we don't need to pay for tokens? Paying for infra (servers, electricity) is already expensive enough