I proposed using reinforcement learning to guide coverage as a potential phd top...

lolsowrong · on Oct 6, 2024

Did you try making small changes to your phd proposal to see if it opened up new paths?

</fuzzingjoke>

carom · on Oct 6, 2024

I think it would go the other way where you use coverage to guide reinforcement. Crank the temperature up to increase variation and you would probably produce a model that could approximate the file format you were targeting.

daghamm · on Oct 6, 2024

Please tell us more!

Fuzzing is often a special case of genetic algorithms, so there is already a tiny connection to RL. I'm curious to hear what your proposal was.

ackbar03 · on Oct 7, 2024

> Fuzzing is often a special case of genetic algorithms

Yes, that was sort of why I thought RL guided fuzzing could work, and possibly better. Also, for things like XSS fuzzing (which I have a little experience in), it is possible for an experienced attacker to intelligibly guide the fuzzing to a payload, which theoretically could be mimicked through RL.

There wasn't really anything novel in the proposal, it was just for a graduate cyber-security course, and one of the deliverables was a project proposal for something related. There were already some existing works that time (around 2-3 years ago) where people tried combining RL with fuzzing, and I just mish-mashed some ideas together so I could hand in something.

My main concern at the time however was that with fuzzing the positive signal would be so rare compared to the negative signal, since most randomly fuzzed inputs would just return the same negative feedback. I wasn't sure that would be enough signal to train an RL system. I'm not quite sure what new progress has been made in the field since then.