This is tied to the TeamPCP activity over the last few weeks. I've been responding, and keeping an up to date timeline. I hope it might help folks catch up and contextualize this incident:
1. I dont have hard metrics at hand but with the latest Sonnet I'd say we reach consensus around 80% of the time, with Opus is almost always but we are not using it due to cost
2. The difference I see in agent behavior when they don't reach consensus is usually either
- when one of them didn't explore enough and lack context
- and/or when their risk assessment is off
The latest happen often, in other workflows based on agents we are now giving clear instruction on how to assess risk and where to draw a line to consider something a true positive.
3. validation is on Sonnet, we don't use persona based prompts but all the 3 validators get's the same task and context. The agent orchestrating them will take their output and make the final decision. We use an internal fork of the claude code github action for now.
* it can inform triage, if you use the extension you're more likely to be impacted
* because it was VSCode, Workplace Trust actually partially mitigated this in at least 38 cases
I have evidence of at least 250 successes for the prompt. Claude definitely appears to have a higher rejection rate. Q also rejects fairly consistently (based on Claude, so that makes sense).
which you already have. Your CNAME record at www.rami.wiki needs to point to "ramimac.github.io/wiki", and your CNAME file in the root of your repo needs to contain "www.rami.wiki" (www is necessary).
https://ramimac.me/trivy-teampcp/#phase-09
reply