> Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.
I've experienced that too. Usually when I request correction, I add something like "Include only production level comments, (not changes)". Recently I also added special instruction for this to CLAUDE.md.
Since some time, Claude Codes's plan mode also writes file with a plan that you could probably edit etc. It's located in ~/.claude/plans/ for me. Actually, there's whole history of plans there.
I sometimes reference some of them to build context, e.g. after few unsuccessful tries to implement something, so that Claude doesn't try the same thing again.
Saw this the other day and loved it. Especially seeing Opus 4.5 degrading prior to the 4.6 release (IIRC) and Codex staying very stable and even improving over time.
But FYI the blog post is not about the actual model being dumbed down, but the command line interface.
What I haven't seen discussed anywhere so far is how big a lead Anthropic seems to have in intelligence per output token, e.g. if you look at [1].
We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.
I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?
I've experienced that too. Usually when I request correction, I add something like "Include only production level comments, (not changes)". Recently I also added special instruction for this to CLAUDE.md.
reply