> The slot machine can drop any hard requirement that you specifically in your AGENTS.md, memory.md or your dozens of skill markdowns. Pretty much guaranteed.
Indeed. That said, I’ve had some success with agent skills, but I use them to make the LLM aware of things it can do using specific external tools. I think it is a really bad idea to use this mechanism to enforce safety rules. We need good sandboxing for this, and promises from a model prone to getting off the rails is not a good substitute.
But I have taught my coding agent to use some ad hoc tools to gather statistics from a directory containing experimental data and things like that. Nobody is going to fine tune a LLM specifically for my field (condensed matter Physics) but using skills I still can make it useful work. Like monitoring simulations where some runs can fail for various reasons and each time we must choose whether to run another iteration or re-start from a previous point, based on eyeballing the results ("the energy is very strange, we should restart properly and flag for review if it is still weird", this sort of things). I don’t give too many rules to the agent, I just give it ways of solving specific problems that may arise.
Not really, unfortunately. I took some inspiration from existing skills, mostly in the official GitHub repo https://github.com/agentskills/ . But mostly I had to come up with them myself. I try to use Claude to help but it was not that useful.
Indeed. That said, I’ve had some success with agent skills, but I use them to make the LLM aware of things it can do using specific external tools. I think it is a really bad idea to use this mechanism to enforce safety rules. We need good sandboxing for this, and promises from a model prone to getting off the rails is not a good substitute.
But I have taught my coding agent to use some ad hoc tools to gather statistics from a directory containing experimental data and things like that. Nobody is going to fine tune a LLM specifically for my field (condensed matter Physics) but using skills I still can make it useful work. Like monitoring simulations where some runs can fail for various reasons and each time we must choose whether to run another iteration or re-start from a previous point, based on eyeballing the results ("the energy is very strange, we should restart properly and flag for review if it is still weird", this sort of things). I don’t give too many rules to the agent, I just give it ways of solving specific problems that may arise.