Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's how they are trained initially, but the resulting model isn't all that useful (was SOTA two years ago but this field moves fast).

A lot of the utility comes from the later finetuning. You can see this using the examples from the article, every mistake they identify with GPT-3 (which is the unfinetuned version) is answered correctly by chatGPT, which has gone through an extensive finetuning process called RLHF.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: