I am using GPT-5.2 Codex with reasoning set to high via OpenCode and Codex and when I ask it to fix an E2E test it tells me that it fixed it and prints a command I can run to test the changes, instead of checking whether it fixed the test and looping until it did. This is just one example of how lazy/stupid the model is. It _is_ a skill issue, on the model's part.
Yeah I meant it more like it is not intuitive to my why OpenAI would fumble it this hard. They have got to have tested it internally and seen that it sucked, especially compared to GPT-5.2