Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If it disregards "NEVER do" instructions, why would it honor your denial when it asks?


You mean like in this example? https://web.archive.org/web/20260313042512/https://gist.gith...

There is never a guarantee with GenAI. If you need to be sure, sandbox it.


There are plenty of examples in the RL training showing it how and when to prompt the human for help or additional information. This is even a common tool in the "plan" mode of many harnesses.

Conversely, it's much harder to represent a lack of doing something


Because it’s just fancy auto-complete.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: