I first tried to make the agent follow concrete paths, but no tricks worked. The reason I wanted to achieve it this way is that I wanted to create universal "rules" that would be followed without additional CLI or hooks. It didn't work out as expected. I needed it for my project, and currently I started to enforce more things with CLI, which is directly accessible by an agent, so it's now "guided" more forcefully.
I tried to also avoid that, and the main orchestrating agent was running subagents running experiments, and others were checking the results. Somewhat it worked as it was not imposing only successes, there were more failures.
I've been working with an agent as a secretary for 3-4 weeks now. CLAUDE.md, daily journal, state file, pipeline tracking.
bigbezet is right, agents have no clue what's worth remembering. What works for me is splitting it: the agent writes what happened, I decide what actually matters. Two places to manage: journal and the STATE.MD, which I request to maintain based on my expectations. Agent can read a journal if it needs, but the main place to check the status is STATE.md.
One thing I haven't seen anyone mention, though. After a few weeks of reading your rants about some coworker, the agent just takes your side on everything. Had to literally add "consider the other person's perspective" to my rules file. It just has too many one-sided notes in the journal. Otherwise you end up with a yes-man that has perfect memory.
The trauma replay thing gaigalas mentioned is real too. I found it hard to not make agent be biased. To be frank, even I'm noticing something like this:
- I complain, agent defends me.
- I'm putting into the chat a response from other llm which was not biased by my journal. It flips sides and now says the research makes much sense.
- I say: "How much biased you are right now." and it responds something about being biased and "... to be frank, the truth is: ...".
Even when asking for not being biased, is starts to play biased because it thinks I expect that. Sneaky bastard.
Recently I've been working on a project in that area (maybe a little bit different but still close enough :)). Because of that I met Alonso, and he's more into in the area you're doing in this project. In general, I'm looking for an answer, how to make intent transformed into predictably working code with no flaws. Please take a look into our conversation here: https://github.com/krzysztofdudek/Yggdrasil/issues/3. We're sharing our thoughts here about this problem.
This is based on Karpathy's autoresearch pattern, generalized to work outside ML. I was doing something similar for my own projects and packaged it up.