Author here. I'm a software engineer with zero cybersecurity experience. I entered a beginner CTF at MWC Barcelona mostly to stress-test Pi (a coding agent) on something I knew nothing about.
The most interesting part for me was reviewing the full conversation logs afterward to figure out whether my steering actually helped or hurt. Turns out about 4 of my 24 interventions were counterproductive and the agent solved the last two phases completely on its own.
The repo has the full writeup, all the exploit scripts, and a table rating every single human message I sent: https://github.com/kafkasl/ctf
Happy to answer questions about the process, the agent, or the competition.
I stopped reading at "The competition itself was a beginner-friendly offensive security CTF..."
Beating a bunch of inexperienced people does not impress me, and is poor sportsmanship as well.
The competition was about solving the challenge, and was aimed at novices like me, so your point is moot and out of place. Even organizers said out loud they encouraged AI tools and checked on each team (me included) often.
The most interesting part for me was reviewing the full conversation logs afterward to figure out whether my steering actually helped or hurt. Turns out about 4 of my 24 interventions were counterproductive and the agent solved the last two phases completely on its own.
The repo has the full writeup, all the exploit scripts, and a table rating every single human message I sent: https://github.com/kafkasl/ctf
Happy to answer questions about the process, the agent, or the competition.