That is indeed an example, but it still isn't an explanation. Sure, humans can a...

That is indeed an example, but it still isn't an explanation. Sure, humans can also generate output without understanding- I've done it myself for my undergrad, throwing words together at 2am to make a deadline. I think quite a few people have remarked that LLMs seem to write like a human that isn't paying attention, which squares with what you said.

But the question remains why this is possible! A good enough pure predictor could play novel games or devise new theorems, but GPT absolutely can't. It can however give confident explanations, as well as write passable code and occasionally even do some novel problem solving (simple, impressive only because it comes from a computer, but still there). The question of why it can do some things and not others is interesting, and can't be swept under the rug just by reiterating that it's a predictor.

Does the structure of language really do such a good job of conveying information that GPT can operate on it blindly and get results, Blindsight style? Is composing prose far easier than we expect, leaving the bulk of the model free to do a tiny amount of "reasoning" that we find unjustly impressive because of how well it's presented? Is it handicapped primarily by the fact that it can only carry out extremely short computations, and can't be trained to use chain-of-reasoning to get around the limitation? We have no idea what, if any, inherent limitations predictive models have. We have no idea why GPT-sized models are good at the things they are, and bad at the things they aren't.