Man it really doesn't need to be said that RLHF is not the only way to instruct tune. The point of my comment was to say that was how GPT3.5 was instruct tuned, via RLHF through a question answer dataset.
At least we have this needless nerd snipe so others won't be potentially misled by my careless quip.
But that's still false. RLHF is not instruction fine-tuning. It is alignment.
GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.
Nearly all popular local models are instruction tuned, but are not RLHF'd. The OAI GPT series are not the only LLMs in the world.