It's not being pedantic. RLHF and instruction tuning are completely different th...

stevenhuang · on May 13, 2023

Man it really doesn't need to be said that RLHF is not the only way to instruct tune. The point of my comment was to say that was how GPT3.5 was instruct tuned, via RLHF through a question answer dataset.

At least we have this needless nerd snipe so others won't be potentially misled by my careless quip.

elcomet · on May 13, 2023

But that's still false. RLHF is not instruction fine-tuning. It is alignment. GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.

stevenhuang · on May 13, 2023

You're right, thanks for the correction

alex_sf · on May 13, 2023

It sounds like we both know that's the case, but there's a ton of incorrect info being shared in this thread re: RLHF and instruction tuning.

Sorry if it came off as more than looking to clarify it for folks coming across it.

stevenhuang · on May 13, 2023

Yes all that misinfo was what lead me to post a quick link. I could have been more clear anyways. Cheers.