I think the benefit may be task separation and cleaning the context between tasks. Asking a single session to do all three has a couple of downsides.
1. The context for each task gets longer, which we know degrades performance.
2. In that longer context, implicit decisions are made in the thinking steps, the model is probably more likely to go through with bad decisions that were made 20 steps back.
The way Stavros does it, is Architect -> Dev -> Review. By splitting the task in three sessions, we get a fresh and shorter context for each task. At minimum skipping the thinking messages and intermediary tool output, should increase the chances of a better result.
Using different agent personas and models at least introduces variability at the token generation, whether it's good or bad, I do not know. As far as I know in general it's supposed to help.
Having the sessions communicate I think is a mistake, because you lose all of the benefits of cleaning up the context, and given the chattiness of LLMs you are probably going to fill up the context with multiple thinking rounds over the same message, one from the session that outputs the message and one from the session reading the message, you are probably going to have competing tool uses, each session using it's own tool calls to read the same content, it will probably be a huge mess.
The way I do it is I have a large session that I interact with and task with planning and agent spawning. I don't have dedicated personas or agents. The benefits the way I see them are I have a single session with an extensive context about what we are doing and then a dedicated task handler with a much more focused context.
What I have seen with my setup is, impressively good performance at the beginning that degrades as feedback and tweaks around work pile up.
Framing LLM use for dev tasks as "narrative" is powerful.
If you want specific, empirical, targeted advice or work from an LLM, you have to frame the conversation correctly. "You are a tenured Computer Science professor agent being consulted on a data structure problem" goes a very long way.
Similarly, context window length and prior progress exerts significant pressure on how an LLM frames its work. At some point (often around 200k-400k tokens in), they seem to reach a "we're in the conclusion of this narrative" point and will sometimes do crazy stuff to reach whatever real or perceived goal there is.
> As someone generally against gambling, I think there's a fair point to be made that Polymarket and similar sites are not fundamentally different from e.g. sports betting.
They are, because the object of the bet is open, it can be abused to generate incentives for desired behavior. For example if you really want the guy writing about the attack out of the picture, you don't send death threats, you instead make a new bet that says so and so does not write for X publication after Y date. Place a large bet against it and let greed and stochastic violence do the rest.
> The fundamental problem of journalism is that the economics no longer works out.
Yes it does, from nytimes actual earning release for Q 2025:
1. The Company added approximately 450,000 net digital-only subscribers compared with the end of the third quarter of 2025, bringing the total number of subscribers to 12.78 million.
2. Total digital-only average revenue per user (“ARPU”) increased 0.7 percent year-over-year to $9.72
2025 subscription revenue was 1.950 billion dollars. Advertising was 565 million that includes 155 million dollars worth of print advertising.
Sure operating profit is only 550 million very close to the advertising revenue, but the bulk of their income is subscriptions, they could make it work if they had to. My suspicion is that if they dropped all the google ads they could have better subscription retention and conversion rates as well.
Yes this, I was a subscriber for about a decade even back then an adblocker was required for sane reading even with a subscription. I cant imagine what it looks like without an adblocker these days.
I have done minor experiments with disabling javascript, it works most publications are far more readable with javascript disabled, you miss carousels and some interactive elements but overall a much better experience.
10? I always prompt prepare a document with questions that will inform the technical brief for the task described above. The end result is between 30 and 70 questions most of the time. And 95% of the questions are valid, i.e. they really help describe what I had in mind, they are either questions I did not think about and would implicitly answer during implementation, or decisions I had already answered in my mind but clearly could be answered in a different way.
Its very useful even for just making your mental model concrete and documented.
And that is just the first round. After I am done we have another round about new concerns that emerged and then a third one typically.
> The more I study science, the more I come to see how often fundamental facts end up being changed so that a profitable industry can be created
> Frequently, when an industry harms many people, it will create a scapegoat to get out of trouble.
I don't know if he is right or wrong, but he should know statements like that ring alarm bells. This is the kind of thing people who believe in apple cider vinegar and crystal healing say. Regardless of the actual truth of the matter in this particular case, it is far more rhetorically prudent to assume a more neutral language that at the minimum does not assume malice and conspiracies.
Edit:
> Thus, since there are so many vested interests behind the vaccine paradigm ...
For inexplicable reasons I have found that some operations are only possible through the app. Ok they are not inexplicable, there is the illusion that the App is more secure and thus some high impact stuff is only possible through the app. At least in my bank.
1. The context for each task gets longer, which we know degrades performance.
2. In that longer context, implicit decisions are made in the thinking steps, the model is probably more likely to go through with bad decisions that were made 20 steps back.
The way Stavros does it, is Architect -> Dev -> Review. By splitting the task in three sessions, we get a fresh and shorter context for each task. At minimum skipping the thinking messages and intermediary tool output, should increase the chances of a better result.
Using different agent personas and models at least introduces variability at the token generation, whether it's good or bad, I do not know. As far as I know in general it's supposed to help.
Having the sessions communicate I think is a mistake, because you lose all of the benefits of cleaning up the context, and given the chattiness of LLMs you are probably going to fill up the context with multiple thinking rounds over the same message, one from the session that outputs the message and one from the session reading the message, you are probably going to have competing tool uses, each session using it's own tool calls to read the same content, it will probably be a huge mess.
The way I do it is I have a large session that I interact with and task with planning and agent spawning. I don't have dedicated personas or agents. The benefits the way I see them are I have a single session with an extensive context about what we are doing and then a dedicated task handler with a much more focused context.
What I have seen with my setup is, impressively good performance at the beginning that degrades as feedback and tweaks around work pile up.
reply