That's how they are trained initially, but the resulting model isn't all that us...

That's how they are trained initially, but the resulting model isn't all that useful (was SOTA two years ago but this field moves fast).

A lot of the utility comes from the later finetuning. You can see this using the examples from the article, every mistake they identify with GPT-3 (which is the unfinetuned version) is answered correctly by chatGPT, which has gone through an extensive finetuning process called RLHF.