Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not being pedantic. RLHF and instruction tuning are completely different things. Painting with watercolors does not make water paint.

Nearly all popular local models are instruction tuned, but are not RLHF'd. The OAI GPT series are not the only LLMs in the world.



Man it really doesn't need to be said that RLHF is not the only way to instruct tune. The point of my comment was to say that was how GPT3.5 was instruct tuned, via RLHF through a question answer dataset.

At least we have this needless nerd snipe so others won't be potentially misled by my careless quip.


But that's still false. RLHF is not instruction fine-tuning. It is alignment. GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.


You're right, thanks for the correction


It sounds like we both know that's the case, but there's a ton of incorrect info being shared in this thread re: RLHF and instruction tuning.

Sorry if it came off as more than looking to clarify it for folks coming across it.


Yes all that misinfo was what lead me to post a quick link. I could have been more clear anyways. Cheers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: