More

throwup238 · 2026-01-25T11:31:40 1769340700

My heuristic is: if you have to ask, you're probably not the right person for the job.

What your describing sounds like the wantrepreneur process and it usually doesn't end well because you will likely have no existing experience in the business domain you are targeting, unless your target market happens to be developer tooling*.

Most successful startups are made by people who already know the problem they are trying to solve because they've experienced it first hand. They have the industry contacts to quickly find early customers and their search for "product-market fit" is usually about whether clients will pay enough to make the startup worthwhile rather than "do clients even want what I'm building."

Ideas are a dime a dozen. Competent engineers are a dime a dozen (relatively). Distribution is what builds startups and that requires industry experience and contacts, or a cofounder that can carry the domain side of the business (in which case they will be the ones filtering ideas).

* I'm assuming you're an engineer. Otherwise you're an idea guy without a clue on how to create viable ideas.

throwup238 · 2026-01-25T11:13:32 1769339612

> (And it's not going to happen since Windows is nothing but a quarterly result side note at this point)

Azure will be the straw that breaks the camel's back. Their stock price has depended on it since Cloud and AI got restructured into a single department (it was Nadella's baby before he became CEO), and Azure was already pretty bad before vibe coding entered the picture.

throwup238 · 2026-01-25T00:29:04 1769300944

> Mayybe for some things you could set it up so that the screen output is livestreamed back into the agent, but I highly doubt that anyone is doing that for agents like this yet

What do you mean by streaming? LLMs aren’t that advanced yet where they can consume a live video feed but people have been feeding them screenshots from Playwright and desktop apps for years (Anthropic even released the Computer Use feature based on this).

Gemini has the best visual intelligence but all three of the major models have supported this for a while. I don’t think it’d help with fixing subtle problems in shadows but it can fix other gui bugs using visual feedback.

throwup238 · 2026-01-25T00:05:48 1769299548

> Is anyone exploring the (imo more practically useful today) space of using agents to put together better changes vs "more commits"?

That’s what I’ve been focused on the last few weeks with my own agent orchestrator. The actual orchestration bit was the easy part but the key is to make it self improving via “workflow reviewer” agents that can create new reviewers specializing in catching a specific set of antipatterns, like swallowing errors. Unfortunately I've found that what decides acceptable code quality is very dependent on project, organization, and even module (tests vs internal utilities vs production services) so prompt instructions like "don't swallow errors or use unwrap" make one part of the code better while another gets worse, creating a conflict for the LLM.

The problem is that model eval was already the hardest part of using LLMs and evaluating agents is even harder if not practically impossible. The toy benchmarks the AI companies have been using are laughably inadequate.

So far the best I’ve got is “reimplement MINPACK from scratch using their test suite” which can take days and has to be manually evaluated.

throwup238 · 2026-01-24T23:41:44 1769298104

Why would they revive the popularity of microservices? They can just as well be used to enforce strict module boundaries within a modular monolith keeping the codebase coherent without splitting off microservices.

WesBrownSQL · 2026-01-25T05:34:58 1769319298

And that's why they call it a hot take. No, it isn't going to give rise to microservices. You absolutely can have your agent perform high-level decomposition while maintaining a monolith. A well-written, composable spec is awesome. This has been true for human and AI coders for a very, very long time. The hat trick has always been getting a well-written, composable spec. AI can help with that bit, and I find that is probably the best part of this whole tooling cycle. I can actually interact with an AI to build that spec iteratively. Have it be nice and mean. Have it iterate among many instances and other models, all that fun stuff. It still won't make your idea awesome or make anyone want to spend money on it, though.

throwup238 · 2026-01-24T16:57:17 1769273837

That’s what I said about self driving cars nearly a decade ago!

The 80/20 rule is a painful lesson to internalize but it’s damn near a universal constant now. That last exponential improvement that takes LLMs over the finish line will take a lot longer than we think.

strange_quark · 2026-01-24T17:05:33 1769274333

I think self driving cars is a good analog. We got lane centering and adaptive cruise control pretty much universally, and some systems are more advanced, but you cannot buy a fully autonomous car. Sure there’s Waymo and others pushing at the edge in very very limited contexts, but most people are still driving their own cars, just with some additional support. I suspect the same will be true for software engineering.

fragmede · 2026-01-24T18:10:49 1769278249

You can buy a Tesla with FSD today. It works. Sure there are some corner cases, but it's good enough in enough cases to get in, punch in a destination, and get you there. But you have to buy a Tesla, and they only come in electric. Waymo's are fine too, you just have to live in a supported area. The real gem for self driving, is Comma.ai.

It's an aftermarket system, but can be installed in most newish cars with a built in lane guidance system. Even a Corolla. And it's really want we want. It's not self-driving, it'll just take you down the freeway and you don't have to do anything except wait for your GPS to tell you to exit (and then you have to exit). Freeway-grade gurves in the road? Fine. Stop and go traffic? Handled. It's better than Tesla's FSD in two specific ways. One is that because it's not self-driving, it's way easier to trust the system because it's not going to change lanes or anything surprising on you. It's not going to do something fancy, just brake or accelerate or follow the lane or car in front of you left or right. I highly recommend it to anyone who does any amount of freeway driving. If not for the coolness of it then simply the safety aspect. Now, I'm sure everyone here is a better driver than most (though that has a problem, mathematically), but this thing is better than a tired/angry/drunk version of someone else driving.

But as you point out, most people are still driving their own cars.

Which, I think is where we're going to see the software development industry going. There's gonna be the AI maximalists, who, like Waymo and FSD, will have AI basically do everything. And then there's the pragmatists, for whom AI doesn't do everything, just enough to be useful.

Then there's everyone else, still writing their own code. Thing is, an app on your computer isn't a car. It's $20/month if not free if you're a total cheapskate, to get codex or claude or another assistant, vs many thousands of dollars for a self-driving or partially self-driving car. The other difference is in time. The value of a partially self-driving car (FSD or Comma) is in the mental fatigue of driving, and in improved safety, but a 7 hour road trip is still going to take 7 hours even if you're not driving. The only time a self-driving car helps is if you're going cross-city in a Waymo, and you're in the back seat working on your laptop. AI assisted coding though is different. If I take on projects I wouldn't do before with AI, that's a win for me. If I'm able to write software faster and with fewer bugs with AI, that's also a win for me, but also a win for my employer. If, however, it goes the other way and I write more bugs, then that's a loss for me and my employer.

cess11 · 2026-01-24T17:07:25 1769274445

I'm not sure what "exponential improvement" would mean in this context, but large models have been a massively hyped and invested thing for what, three-four years or so, right?

And what do they run on? Information. The production of which is throttled by the technology itself, in part because the salespeople claim it can (and should) "replace" workers and thinkers, in part because many people have really low standards for entertainment and accept so called slop instead of cheap tropes manually stitched together.

So it would seem unlikely that they'll get the required information fed into them that would be needed for them to outpace the public internet och and widely pirated books and so on.

fragmede · 2026-01-24T18:20:10 1769278810

The counterpoint to this is that information is sealed up in bottles that previously haven't been worth it to be unsealed. How much can you charge for the source code to a program writing in Zig to calculate the fibonacci sequence? That's worth approximately zero. But generating a million tested programs, with source code, that have been run through a compiler and test suite, to be used as "information" suddenly becomes worth it to the AI labs to buy up, if not generate for themselves. So imo there's still ways to go, even if the human Internet isn't growing as much post-AI as in all the years before it.

throwup238 · 2026-01-23T21:02:09 1769202129

> At the same time, the amount of anti-patterns the LLM generates is higher than I am able to manage. No Claude.md and Skills.md have not fixed the issue.

This is starting to drive me insane. I was working on a Rust cli that depends on docker and Opus decided to just… keep the cli going with a warning “Docker is not installed” before jumping into a pile of garbage code that looks like it was written by a lobotomized kangaroo because it tries to use an Option<Docker> everywhere instead of making sure its installed and quitting with an error if it isn’t.

What do I even write in a CLAUDE.md file? The behavior is so stupid I don’t even know how to prompt against it.

throwup238 · 2026-01-23T07:19:25 1769152765

Everything that's wrong with venture capitalism condensed into a single fifteen word sentence. Bravo.

If you can't provide a billion dollars worth of value, extract a billion dollars worth of grift!

I hear A16Z is hiring.

skrebbel · 2026-01-23T07:47:05 1769154425

How is being outcompeted “grift”? I feel like I’m missing some context here.

DSingularity · 2026-01-23T08:25:29 1769156729

Why do you start a startup? Is it to build an idea you believe in and believe it is potentially lucrative or is it so you can go through the motions, say the trendy things, and get outcompeted because in the end you are primarily focused on getting acquired with a 1B exit?

qeternity · 2026-01-23T09:06:06 1769159166

Spoken like someone who has never started a business. Brex raised much less than $5b and Capital One apparently thinks it is worth more than that (otherwise they wouldn’t buy it).

This is called value creation.

jjfoooo4 · 2026-01-23T19:04:53 1769195093

I think the investors who put $300m in at a $12b valuation would disagree

qeternity · 2026-01-23T20:58:43 1769201923

I don’t think you understand how liquidation preferences work.

They will get $300m back.

Opportunity cost sure. But zero nominal loss.

pinnochio · 2026-01-23T10:07:13 1769162833

Definitely. No company has ever overpaid for another company. No fraud or FOMO-driven overvaluation has ever occurred in an acquisition. And all acquisitions have always turned out for the best. It's all 100% pure value creation.

solarkraft · 2026-01-23T11:22:18 1769167338

Your statement is true on average because the world’s economy is continuing to function.

komali2 · 2026-01-24T15:51:42 1769269902

> Your statement is true on average because the world’s economy is continuing to function.

The entire field of economics depends on post ipso facto statements like this.

pinnochio · 2026-01-23T11:25:34 1769167534

Oh wow, I don't even know where to begin with that.

Like, the world economy can't continue to function even if acquisitions were only 80% value creation on average? Or does the entire world economy depend on companies acquiring other companies with 100% value creation on average, such that it continuing to function logically implies 100% average value creation?

re-thc · 2026-01-23T15:58:44 1769183924

> can't continue to function even if acquisitions were only 80% value creation on average

The number is much much lower than that. Most acquisitions fail or don't have much impact.

shafyy · 2026-01-23T12:47:37 1769172457

"functioning" is doing a lot of heavy lifting here

qeternity · 2026-01-23T21:01:57 1769202117

Definitely. And some random guy on HN knows the value of Brex to Capital One better than Capital One does.

Brex can be worth $5b today and also be worth less in the future. These two realities don’t conflict. Acquisitions can and do end poorly. But the vast majority work well. I am not sure what you don’t understand about that?

throwup238 · 2026-01-22T20:24:05 1769113445

Anthropic has been flying by the seat of their pants for a while now and it shows across the board. From the terminal flashing bug that’s been around for months to the lack of support to instabilities in Claude mobile and Code for the web (I get 10-20% message failure rates on the former and 5-10% on CC for web).

They’re growing too fast and it’s bursting the seams of the company. If there’s ever a correction in the AI industry, I think that will all quickly come back to bite them. It’s like Claude Code is vibe-operating the entire company.

laserDinosaur · 2026-01-22T23:13:36 1769123616

The Pro plan quota seems to be getting worse. I can get maybe 20-30 minutes work done before I hit my 4 hour quota. I found myself using it more just for the planning phase to get a little bit more time out of it, but yesterday I managed to ask it ONE question in plan mode (from a fresh quota window), and while it was thinking it ran out of quota. I'm assuming it probably pulled in a ton of references from my project automatically and blew out the token count. I find I get good answers from it when it does work, but it's getting very annoying to use.

(on the flip side, Codex seems like it's being SO efficient with the tokens it can be hard to understand its answers sometimes, it rarely includes files without you doing it manually, and often takes quite a few attempts to get the right answer because it's so strict what it's doing each iteration. But I never run out of quota!)

stareatgoats · 2026-01-22T23:30:53 1769124653

Claude Code allegedly auto-includes the currently active file and often all visible tabs and sometimes neighboring files it thinks are 'related' - on every prompt.

The advice I got when scouring the internets was primarily to close everything except the file you’re editing and maybe one reference file (before asking Claude anything). For added effect add something like 'Only use the currently open file. Do not read or reference any other files' to the prompt.

I don't have any hard facts to back this up, but I'm sure going to try it myself tomorrow (when my weekly cap is lifted ...).

sigseg1v · 2026-01-23T03:24:44 1769138684

What does "all visible tabs" mean in the context of Claude Code in a terminal window? Are you saying it's reading other terminals open on the system? Also how do you determine "currently active file"? It just greps files as needed.

adobrawy · 2026-01-23T05:37:52 1769146672

You can install VSCode extension and use "/ide" to connect them.

withinboredom · 2026-01-23T06:48:20 1769150900

Do people actually use this mode? Having to approve diffs in the ide is too annoying.

solumunus · 2026-01-23T07:43:14 1769154194

Depends on my task. If it’s complex and my expectation is for Claude to get things wrong the diff preview is helpful.

vidarh · 2026-01-23T08:49:17 1769158157

Even then, I'd wait until it's had a chance to iterate and correct itself in a loop before I'd even consider looking at the output, or I end up babysitting it to prevent it from making mistakes it'd often recognise and fix itself if given the chance.

solumunus · 2026-01-23T11:24:47 1769167487

True. I’ve been strictly in the terminal for weeks and I have a stop hook which commits each iteration after successful rust compilation and frontend typechecks, then I have a small command line tool to quickly review last commit. It’s a pretty good flow!

HumanOstrich · 2026-01-23T16:51:10 1769187070

You can tell it not to do that and it will show inline diffs.

DANmode · 2026-01-24T04:02:11 1769227331

13 days ago on HN:

https://news.ycombinator.com/item?id=46566292

idonotknowwhy · 2026-01-23T16:45:57 1769186757

Yes, it does exactly that. It also sends other prompts like generating 3 options to choose from, prefilling a reply like 'compile the code', etc. (I can confirm this because I connect CC to llama.cpp and use it with GLM-4.7. I see all these requests/prompts in the llama-server verbose log.)

You can stop most of this with

export DISABLE_NON_ESSENTIAL_MODEL_CALLS=1

And might as well disable telemetry, etc: export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

I also noticed every time you start CC, it sends off > 10k tokens preparing the different agents. So try not to close / re-open it too often.

source: https://code.claude.com/docs/en/settings

JLCarveth · 2026-01-23T17:36:07 1769189767

I would always close claude to start a new chat... Guess I should stop doing that. Thanks for bringing my attention to those two env vars.

aanet · 2026-01-23T00:29:38 1769128178

^ THIS

I've run out of quota on my Pro plan so many times in the past 2-3 weeks. This seems to be a recent occurrence. And I'm not even that active. Just one project, execute in Plan > Develop > Test mode, just one terminal. That's it. I keep getting a quota reset every few hours.

What's happening @Anthropic ?? Anybody here who can answer??

alexk6 · 2026-01-23T03:03:53 1769137433

[BUG] Instantly hitting usage limits with Max subscription: https://github.com/anthropics/claude-code/issues/16157

It's the most commented issue on their GitHub and it's basically ignored by Anthropic. Title mentions Max, but commenters report it for other plans too.

quikoa · 2026-01-23T07:57:24 1769155044

It's not a bug it's a feature (for Anthropic).

cyanydeez · 2026-01-23T12:10:47 1769170247

Its not a bug, it's a poorly defined business model!

czk · 2026-01-23T03:19:40 1769138380

“After creating a new account, I can confirm the quota drains 2.5x–3x slower. So basically Max (5x) on an older accounts is almost like Pro on a new one in terms of quota. Pretty blatant rug pull tbh.”

lol

Aeolun · 2026-01-23T10:07:09 1769162829

Your quota also seems to be higher after unsubscribing and resubscribing?

CuriouslyC · 2026-01-23T16:18:39 1769185119

They'll also send you a free month of the 100 dollar plan if you unsubscribe to try and get you back.

0x9e3779b6 · 2026-01-23T17:18:45 1769188725

or (tested on Max x20 plan) when the subscription renewal fails by any reason (they try charge your CC multiple times) then you are still in for 2+ weeks till it dies

vbezhenar · 2026-01-23T01:48:28 1769132908

This whole API vs plan looks weird to me. Why not force everyone to use API? You pay for what you use, it's very simple. API should be the most honest way to monetize, right?

This fixed subscription plan with some hardly specified quotas looks like they want to extract extra money from these users who pay $200 and don't use that value, at the same time preventing other users from going over $200. Like I understand that it might work at scale, but just feels a bit not fair to everyone?

rootusrootus · 2026-01-23T03:58:58 1769140738

You're welcome to use the API, it asks you to do that when you run out of quota on your Pro plan. The next thing you find out is how expensive using the API is. More honest, perhaps, but you definitely will be paying for that.

Jeff_Brown · 2026-01-23T12:58:58 1769173138

I tried the API once. Burned 7 dollars in 15 minutes.

nobodywillobsrv · 2026-01-23T06:55:09 1769151309

The fixed fee plan is because the agent and the tools have internal choices/planning about cost. If you simply pay for API the only feedback to them that they are being too costly is for you to stop.

If you look at tool calls like MCP and what not you can see it gets ridiculous. Even though it's small for example calling pal MCP from the prompt is still burning tokens afaik. This is "nobody's" fault in this case really but you can see how the incentives are and we all need to think how to make this entire space more usable.

sailfast · 2026-01-23T15:55:53 1769183753

Not a doctor or anything, but API usage seems to support the more on-demand / spiky workflows available at a much larger scale, whereas a single seat, authenticated to Claude Code has controlled / set capacity and is generally more predictable and as a result easier to price?

API request method might have no cap, but they do cap Claude Code even on Max licenses, so easier to throttle as well if needed to control costs. Seems straightforward to me at any rate. Kinda like reserved instance vs. spot pricing models?

throwup238 · 2026-01-23T05:01:35 1769144495

Consumers like predictable billing more than they care about getting the most bang for their buck and beancounters like sticky recurring revenue streams more than they care about maximizing the profit margins for every user.

charcircuit · 2026-01-23T06:16:50 1769149010

I just like beong able to make like $250 of API calls for $20.

Aeolun · 2026-01-23T12:01:34 1769169694

If only it was API calls. I like using it through claude code. But it would be infinitely more flexible if my $200 subscription worked through the API

frotaur · 2026-01-23T12:25:13 1769171113

I don't understand, you CAN use claude code through the API.

horsawlarway · 2026-01-23T13:42:48 1769175768

Yeah, but he can't use his $200 subscription for the API.

That's limited to accessing the models through code/desktop/mobile.

And while I'm also using their subscriptions because of the cost savings vs direct access, having the subscription be considerably cheaper than the usage billing rings all sorts of alarm bells that it won't last.

MillionOClock · 2026-01-23T02:39:41 1769135981

I very recently (~ 1 week ago) subscribed to the Pro plan and was indeed surprised by how fast I reached my quota compared to say Codex with similar subscription tier. The UX is generally really cool with Claude Code, which left me with a bit of a bittersweet feeling of not even being able to truly explore all the possibilities since after just making basic planning and code changes I am already out of quota for experimenting with various ways of using subagents, testing background stuff etc.

jack_pp · 2026-01-23T12:20:30 1769170830

I remember a couple of weeks ago when people raved about Claude Code I got a feeling like there's no way this is sustainable, they must be using tokens like crazy if used as described. Guess Anthropic did the math as well and now we're here.

0x500x79 · 2026-01-23T03:54:49 1769140489

I use opencode with codex after all the shenanigans from anthropic recently. You might want to give that a shot!

rubenflamshep · 2026-01-23T16:41:38 1769186498

The best thing about the max plan has been that I don’t have “range anxiety” with my workflows. This opens me to trying random things on a whim and explore the outer limits of the LLM capabilities more.

behnamoh · 2026-01-23T06:41:46 1769150506

Use cliproxyapi and use any model in CC. I use Codex models in CC and it's the best of both worlds!

heavyset_go · 2026-01-23T02:17:33 1769134653

Like a good dealer, they gave you a cheap/free hit and now you want more. This time you're gonna have to pay.

bmurphy1976 · 2026-01-23T02:08:56 1769134136

I've been hitting the limit a lot lately as well. The worst part is I try to compact things and check my limits using the / commands and can't make heads or tails how much I actually have left. It's not clear at all.

I've been using CC until I run out of credits and then switch to Cursor (my employer pays for both). I prefer Claude but I never hit any limits in Cursor.

rubenflamshep · 2026-01-23T16:44:17 1769186657

Hmm, are you using the /usage command? There’s also the ccusage package that I find useful.

bmurphy1976 · 2026-01-23T18:15:55 1769192155

Thanks. I don't know why but I just I couldn't find that command. I spent so much time trying to understand what /context and other commands were showing me I got lost in that noise.

fragmede · 2026-01-23T00:47:51 1769129271

How quickly do you also hit compaction when running? Also, if you open a new CC instance and run /context, what does it show for tools/memories/skills %age? And that's before we look at what you're actually doing. CC will add context to each prompt it thinks is necessary. So if you've got a few number of large files, (vs a large number of smaller files), at some level that'll contribute to the problem as well.

Quota's basically a count of tokens, so if a new CC session starts with that relatively full, that could explain what's going on. Also, what language is this project in? If it's something noisy that uses up many tokens fast, even if you're using agents to preserve the context window in the main CC, those tokens still count against your quota so you'd still be hitting it awkwardly fast.

genewitch · 2026-01-23T00:34:13 1769128453

sounds like the "thinking tokens" are a mechanism to extract more money from users?

vunderba · 2026-01-23T01:13:19 1769130799

Anecdotally but it definitely feels like in the last couple weeks CC tends to be more aggressive at pulling in significantly larger chunks of an existing code base - even for some simple queries I'll see it easily ramp up to 50-60k token usage.

troyvit · 2026-01-23T05:33:33 1769146413

This really speaks to the need to separate the LLM you use and the coding tool that uses it. LLM makers utilizing the SaaS model make money on the tokens you spend whether or not they need them. Tools like aider and opencode (each in their own way) use separate tools build a map of the codebase that they can use to work with code using fewer tokens. When I see posts like this I start to understand why Anthropic now blocks opencode.

We're about to get Claude Code for work and I'm sad about it. There are more efficient ways to do the job.

ayewo · 2026-01-23T06:13:07 1769148787

When you state it like that, I now totally understand why Anthropic have a strong incentive to kick out OpenCode.

OpenCode is incentivized to make a good product that uses your token budget efficiently since it allows you to seamlessly switch between different models.

Anthropic as a model provider on the other hand, is incentivized to exhaust your token budget to keep you hooked. You'll be forced to wait when your usage limits are reached, or pay up for a higher plan if you can't wait to get your fix.

CC, specifically Opus 4.5, is an incredible tool, but Anthropic is handling its distribution the way a drug dealer would.

Brian_K_White · 2026-01-23T16:36:03 1769186163

It's like the very first days of computers at all. IBM supplied both the hardware and the software, and the software did not make the most efficient use of the hardware.

Which was nothing new itself of course. Conflicts of interest didn't begin with computers, or probably even writing.

vidarh · 2026-01-23T08:57:56 1769158676

OpenCode also would be incentivized to do things like having you configure multiple providers and route requests to cheaper providers where possible.

Controlling the coding tool absolutely is a major asset, and will be an even greater asset as the improvements in each model iteration makes it matter less which specific model you're using.

jack_pp · 2026-01-23T12:25:40 1769171140

You think after 27 billions invested they're gonna be ethical or want to get their money back as fast as possible?

genewitch · 2026-01-23T01:28:21 1769131701

I'm curious if anyone has logged the number of thinking tokens over time. My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

they get to see (if not opted-out) your context, idea, source code, etc. and in return you give them $220 and they give you back "out of tokens"

throwup238 · 2026-01-23T01:50:41 1769133041

> My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

It's also a way to improve performance on the things their customers care about. I'm not paying Anthropic more than I do for car insurance every month because I want to pinch ~~pennies~~ tokens, I do it because I can finally offload a ton of tedious work on Opus 4.5 without hand holding it and reviewing every line.

The subscription is already such a great value over paying by the token, they've got plenty of space to find the right balance.

NitpickLawyer · 2026-01-23T06:13:37 1769148817

> My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.

I've done RL training on small local models, and there's a strong correlation between length of response and accuracy. The more they churn tokens, the better the end result gets.

I actually think that the hyper-scalers would prefer to serve shorter answers. A token generated at 1k ctx length is cheaper to serve than one at 10k context, and way way cheaper than one at 100k context.

genewitch · 2026-01-23T07:21:54 1769152914

> there's a strong correlation between length of response and accuracy

i'd need to see real numbers. I can trigger a thinking model to generate hundreds of tokens and return a 3 word response (however many tokens that is), or switch to a non-thinking model of the same family that just gives the same result. I don't necessarily doubt your experience, i just haven't had that experience tuning SD, for example; which is also xformer based

I'm sure there's some math reason why longer context = more accuracy; but is that intrinsic to transformer-based LLMs? that is, per your thought that the 'scalers want shorter responses, do you think they are expending more effort to get shorter, equivalent accuracy responses; or, are they trying to find some other architecture or whatever to overcome the "limitations" of the current?

jumploops · 2026-01-23T04:29:01 1769142541

I believe Claude Code recently turned on max reasoning for all requests. Previously you’d have to set it manually or use the word “ultrathink”

vidarh · 2026-01-23T08:54:51 1769158491

It's absolutely a work-around in part, but use sub-agents, have the top level pass in the data, and limit the tool use for the sub-agent (the front matter can specify allowed tools) so it can't read more.

(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).

behnamoh · 2026-01-23T06:43:33 1769150613

> more aggressive at pulling in significantly larger chunks of an existing code base

They need more training data, and with people moving on to OpenCode/Codex, they wanna extract as much data from their current users as possible.

arthurcolle · 2026-01-23T02:31:04 1769135464

Their system prompt + MCP is more of the culprit here. 16 tools, sophisticated parameters, you're looking at 24K tokens minimum

behnamoh · 2026-01-23T06:42:37 1769150557

probably, because they recently said the ultrathink is enabled by default now.

genewitch · 2026-01-23T07:22:44 1769152964

does this translate into "the end-user's cost goes up"

by default?

mystraline · 2026-01-23T01:00:51 1769130051

Its the clanker version of the "Check Wallet Light" (check engine light).

behnamoh · 2026-01-23T06:40:45 1769150445

> I've run out of quota on my Pro plan so many times in the past 2-3 weeks.

Waiting for Anthropic to somehow blame this on users again. "We investigated, turns out the reason was users used it too much".

ChicagoDave · 2026-01-23T03:25:20 1769138720

I never run out of this mysterious quota thing. I close Claude Code at 10% context and restart.

I work for hours and it never says anything. No clue why you’re hitting this.

$230 pro max.

fluidcruft · 2026-01-23T15:25:46 1769181946

Does closing claude code do something that running /clear does not?

idonotknowwhy · 2026-01-23T16:57:37 1769187457

Yeah, it re-sends all the agent system prompts.

yjtpesesu2 · 2026-01-23T05:26:51 1769146011

Any clue why you might be a favored/favoured high value user?

0xack · 2026-01-23T05:51:08 1769147468

The entire conversation is fed in as context effectively compounding your token usage over the course of a session. Sessions are most efficient when used for one task only.

ChicagoDave · 2026-01-23T07:15:53 1769152553

I get a decent amount of work in before restarts.

croes · 2026-01-23T03:49:15 1769140155

Pro is 20x less than Max

nwatson · 2026-01-23T04:33:40 1769142820

Self-hosted might be the way to go soon. I'm getting 2x Olares One boxes, each with an RTX 5090 GPU (NVIDIA 24GB VRAM), and a built-in ecosystem of AI apps, many of which should be useful, and Kubernetes + Docker will let me deploy whatever else I want. Presumably I will manage to host a good coding model and use Claude Code as the framework (or some other). There will be many good options out there soon.

behnamoh · 2026-01-23T06:39:07 1769150347

> Self-hosted might be the way to go soon.

As someone with 2x RTX Pro 6000 and a 512GB M3 Ultra, I have yet to find these machines usable for "agentic" tasks. Sure, they can be great chat bots, but agentic work involves huge context sent to the system. That already rules out the Mac Studio because it lacks tensor cores and it's painfully slow to process even relatively large CLAUDE.md files, let alone a big project.

The RTX setup is much faster but can only support models ≤192GB, which severely limits its capabilities as you're limited to low Q GLM 4.7, GLM 4.7 Flash/Air/ GPT OSS 120b, etc.

NitpickLawyer · 2026-01-23T06:08:52 1769148532

I've been using local LLMs since before chatgpt launched (gpt-j, gpt-neox for those that remember), and have tried all the promising models as they launch. While things are improving faster than I thought ~3 years ago, we're still not there in terms of 1-1 comparison with the SotA models. For "consumer" local at least.

The best you can get today with consumer hardware is something like devstral2-small(24B) or qwen-coder30b(underwhelming) or glm-4.7-flash (promising but buggy atm). And you'll still need beefy workstations ~5-10k.

If you want open-SotA you have to get hardware worth 80-100k to run the big boys (dsv3.2, glm4.7, minimax2.1, devstral2-123b, etc). It's ok for small office setups, but out of range for most local deployments (esp considering that the workstations need lots of power if you go 8x GPUs, even with something like 8x 6000pro @ 300w).

zen4ttitude · 2026-01-23T06:41:34 1769150494

I think this is the future as well, running locally, controlling the entire pipeline. I built acf on github using Claude among others. You essentially configure everything as you want, models, profiles, agents and RAG. It's free. I also built a marketplace to sell or give away to the community these pipeline enhancements. It's a project I wanted to do for a while and Claude was nice to me allowing it to happen. It's a work in progress but you have 100% control, locally. There is also a website for those not as technical where you can buy credits or plugin Claude or OpenAI APIs. Read the manifesto. I need help now and contributors.

thunfischtoast · 2026-01-23T06:52:51 1769151171

I've used the Anthropic models mostly through Openrouter using aider. With so much buzz around Claude Code I wantes to try it out and thought that a subscription might be more cost efficient for me. I was kinda disappointed by how quickly I hit the quota limit. Claude Code gives me a lot more freedom than what aider can do, on the other side I have the feeling that pure coding tasks work better through aider or Roo Code. The API version is also much much faster that the subscription one.

aja12 · 2026-01-23T07:03:01 1769151781

Being in the same boat as you I switched to OpenCode with z.ai GLM 4.7 Pro plan and it's quite ok. Not as smart as Opus but smart enough for my needs, and the pricing is unbeatable

davidwritesbugs · 2026-01-23T07:57:01 1769155021

Ditto. It is very very slow but I never hit quota limits but people on Discord are complaining like mad it is slow even on the Pro plans. I tend to use glm-*air a lot for planning before using 4.7

thunfischtoast · 2026-01-23T11:35:48 1769168148

I've also see OpenCode around, but have yet to try it. I wonder how it compares to Roo Code

rasmus1610 · 2026-01-23T18:24:09 1769192649

Very happy to see that I am not the only one. My pro subscription lasts maybe 30 minutes for the 5 hour limit. It is completely unusable and that's why I actually switched to OpenCode + GLM 4.7 for my personal projects and. It's not as clever as Opus 4.5 but it often gets the job done anyway

IgorPartola · 2026-01-23T00:14:15 1769127255

You are giving me images from The Bug Short where the guy goes to investigate mortgages and knocks on some random person’s door to ask about a house/mortgage just to learn that it belongs to a dog. Imagine finding out that Anthropic employs no humans at all. Just an AI that has fired everyone and been working on its own releases and press releases since.

moring · 2026-01-23T07:10:29 1769152229

"Just an AI that has fired everyone"

At least it did not turn against them physically... "get comfortable while I warm up the neurotoxin emitters"

smcin · 2026-01-23T03:33:10 1769139190

'The Big Short' (2015)

taneq · 2026-01-24T10:41:03 1769251263

So "The Bug Short" is still up for grabs if anyone wants to make a documentary about the end of the AI bubble? :D

sixtyj · 2026-01-22T20:58:40 1769115520

They whistleblowed themselves that Claude Cowork was coded by Claude Code… :)

throwup238 · 2026-01-22T22:04:18 1769119458

You can tell they’re all vibe coded.

Claude iOS app, Claude on the web (including Claude Code on the web) and Claude Code are some of the buggiest tools I have ever had to use on a daily basis. I’m including monstrosities like Altium and Solidworks and Vivado in the mix - software that actually does real shit constrained by the laws of physics rather than slinging basic JSON and strings around over HTTP.

It’s an utter embarrassment to the field of software engineering that they can’t even beat a single nine of reliability in their consumer facing products and if it wasn’t for the advantage Opus has over other models, they’d be dead in the water.

cactusplant7374 · 2026-01-22T22:43:48 1769121828

You're right.

https://github.com/anthropics/claude-code/issues

Codex has less but they also had quite a few outages in December. And I don't think Codex is as popular as Claude Code but that could change.

qcnguy · 2026-01-23T09:32:09 1769160729

Don't bother filing issues there. Their issue tracker is a galaxy-sized joke. They automatically close issues after 30 days of inactivity even if they weren't fixed, just to keep the issue count low.

The Reasonable Man might think that an AI company OF ALL COMPANIES would be able to use AI to triage bug tickets and reproduce them, but no! They expect humans to keep wasting their own time reproducing, pinging tickets and correcting Claude when it makes mistakes.

Random example: https://github.com/anthropics/claude-code/issues/12358

First reply from Anthropic: "Found 3 possible duplicate issues: This issue will be automatically closed as a duplicate in 3 days."

User replies, two of the tickets are irrelevant, one didn't help.

Second reply: "This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes."

Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.

Macha · 2026-01-23T12:15:10 1769170510

> Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.

Upcoming Anthropic Press Release: By using Claude to direct users to existing bugs reports, we have reduced tickets requiring direct action by xx% and even reduced the rate of incoming tickets

loopdoend · 2026-01-23T01:15:05 1769130905

Single nine reliability would be 90% uptime lol. For 99.9% we call it triple 9 reliability.

throwup238 · 2026-01-23T01:23:03 1769131383

Single 9 would be 90%, which is roughly what I’m experiencing between CC for Web and the Claude iOS app. About 1 in 10 messages fail because of an unknown error and 1 in 10 CC for web sessions die irrecoverably. It’d probably be worse except for the fact that CC’s bugs in the terminal aren’t show stoppers like they are on web/mobile.

The only way Anthropic has two or three nines is in read only mode, but that’s be like measuring AWS using the console uptime while ignoring the actual control plane.

jrflowers · 2026-01-23T04:24:53 1769142293

Single nine could be just 9% :D

0x500x79 · 2026-01-23T03:56:51 1769140611

Even their status page (which are usually gamed) shows two 9s over the past 90 days.

fizx · 2026-01-23T02:48:22 1769136502

hey, they have 9 8's

notsure2 · 2026-01-22T21:18:43 1769116723

Whistleblowed dog food.

b00ty4breakfast · 2026-01-22T22:09:21 1769119761

normally you don't share your dog food when you find out it actually sucks.

threecheese · 2026-01-23T11:53:28 1769169208

We’re an Anthropic enterprise customer, and somehow there’s a human developer of theirs on a call with us just about every week. Chatting, tips and tricks etc.

I think they are just focusing on where the dough is.

cyanydeez · 2026-01-23T12:09:22 1769170162

I think your surmise is probably wrong. It's not that their growing to fast, it's that their service is cheaper than the actual cost of doing business.

Growth isn't a problem unless you dont actually pay for the cost of every user you subscribe. Uber, but for poorly profitable business models.

oblio · 2026-01-23T23:32:21 1769211141

Interesting comparison, Uber.

> Since its founding in 2009, Uber has incurred a cumulative net loss of approximately $10.9 billion.

Now, Uber has become profitable, and will probably become a bit more profitable over time.

But except for speculators and probably a handful of early shareholders, Uber will have lost everyone else money for 20 years since its founding.

For comparison, Lyft, Didi, Grab, Bolt are in the same boat, most of them are barely turning profitable after 10+ years. Turns out taxis are a hard business, even when you ramp up the scale to 11. Though they might become profitable over the long term and we'll all get even worse and more abusive service, and probably more expensive than regular taxis would have been, 15-20 years from now.

I mean, we got some better mobile apps from taxi services, so there's that.

Oh, also a massive erosion of labor rights around the world.

cyanydeez · 2026-01-24T12:24:37 1769257477

I suppose my comparison is that Uber eventually turned a profit and mostly displaced the competitors.

I don't see the current investments turning a profit. Maybe the datacenters will, but most of AI is going to be washed out when somewhere, someone wants to take out their investment and the new Bernie Madoff can't find another sucker.

Bombthecat · 2026-01-22T20:30:14 1769113814

Well, they vibe code almost every tool at least

tuhgdetzhh · 2026-01-22T20:58:56 1769115536

Claude Code has accumulated so much technical dept (+emojis) that Claude Code can no longer code itself.

behnamoh · 2026-01-23T06:50:06 1769151006

yeah, and it gets so clunky and laggy when the context grows. Anthropic just can't make software and yet they claim 90% of code will be written by AI by yesterday.

wwweston · 2026-01-22T22:08:15 1769119695

What’s the opposite of bootstrapping? Stakebooting?

outside1234 · 2026-01-22T22:39:28 1769121568

Sell to Google and run away

fsckboy · 2026-01-23T05:54:39 1769147679

pulling yourself down by your chinstrap

irishcoffee · 2026-01-22T22:38:18 1769121498

Hmm... VC funded?

PunchyHamster · 2026-01-23T07:46:13 1769154373

I believe its "digging your own grave"

throwup238 · 2026-01-22T19:02:11 1769108531

> They were arguably right. Pre literate peole could memorise vast texts (Homer's work, Australian Aboriginal songlines). Pre Gutenberg, memorising reasonably large texts was common. See, e.g. the book Memory Craft.

> We're becoming increasingly like the Wall E people, too lazy and stupid to do anything without our machines doing it for us, as we offload increasing amounts onto them.

You're right about the first part, wrong about the second part.

Pre-Gutenberg people could memorize huge texts because they didn't have that many texts to begin with. Obtaining a single copy cost as much as supporting a single well-educated human for weeks or months while they copied the text by hand. That doesn't include the cost of all the vellum and paper which also translated to man-weeks of labor. Rereading the same thing over and over again or listening to the same bard tell the same old story was still more interesting than watching wheat grow or spinning fabric, so that's what they did.

We're offloading our brains onto technology because it has always allowed us to function better than before, despite an increasing amount of knowledge and information.

> Yes, it's too early to be sure, but the internet, Google and Wikipedia arguably haven't made the world any better (overall).

I find that to be a crazy opinion. Relative to thirty years ago, quality of life has risen significantly thanks to all three of those technologies (although I'd have a harder time arguing for Wikipedia versus the internet and Google) in quantifiable ways from the lowliest subsistence farmers now receiving real time weather and market updates to all the developed world people with their noses perpetually stuck in their phones.

You'd need some weapons grade rose tinted glasses and nostalgia to not see that.

wisty · 2026-01-22T22:16:27 1769120187

Economists suggest we are in many ways no more productive now than when Homer Simpson could buy a house and raise a family on a single income - https://en.wikipedia.org/wiki/Productivity_paradox

throwup238 · 2026-01-23T04:16:39 1769141799

I don’t care if “we” are more productive and I certainly don’t care what western economists think about pre-industrial agriculture. I care that the two billion people living in households dependent on subsistence farming have a better quality of life than they did before the internet or mobile phones, which they undeniably have. That much was obvious fifteen to twenty years ago when mobile networks were rolling out all over over Africa en masse and every village I visited on my continental roadtrip had at least one mobile phone that everybody shared to get weather forecasts and coordinate trips to the nearest market town.

Anyone in a developed country who bases their opinions on the effects of technology on their and their friends’ social media addictions is a complete fool. Life has gotten so much better for BILLIONS of people in the last few decades that it’s not even a remotely nuanced issue.