Given rapidly decelerating quality of, at least, claude code output, the agentic coding use may decrease. It is insane how bad the results of background agents are now: constant hallucinations, nonsensical outputs.
The heavy users of Claude at my job disagree (me included), our work gets shipped and the quality has increased by all metrics. Are you talking about enterprise or consumer Claude subscriptions? I think they're serving drastic different quality depending on how much $ you fork up.
I don't see much sense to have hn as support thread, but here are quotes from my single claude investigation session, and that happens in every claude code session that I have, especially with 4.7
* The first agent's claim that was 3.x-only was wrong
* is nice-to-have but doesn't target our exact case as cleanly as the agent claimed.
* The agent's "direct fix for yyy" is overstated.
* not 57% as the earlier agent claimed
etc etc etc
And I forgot how many times my session with claude starts: did you read my personal CLAUDE.md and use background agents for long running operations?
I use enterprise subscription, max effort, was with both 4.6 and 4.7.
And please refrain from comments like "you're using it wrong", as the drop in output quality is very clear and noticeable.
Much like Windows threads with people experience strange bugs without knowing any of their workloads and tools, it's impossible to say. We've got a team of 30 using it full time, and as a member of end leadership I would be hearing if it was constantly missing expectations. It did take iterations to get here, as with everything.
Some of the usual suspects when people are getting bad results:
* Overbloated claude.md, it should not contain everything, it should be a table of contents pointing to other files
* Max effort - why? Overthinking on simpler tasks results in degraded quality, much like in humans.
* You speak of your single session but with agents reviewing other agent outputs. Without knowing your goal and your prompt, and what the agents had access to, my first inclination is that the initial request was vague, a bunch of unnecessary info was returned, and your review step caught that extra jank.
I'm not gonna bother making the joke you allude to, but every single employee I've worked with in person has had glaring holes in their setups which, once solved, dramatically reduced stuff like what you're talking about.