Until they've undergone preference tuning and RL post-training (which is expecte...

		in-silico 3 months ago \| parent \| context \| favorite \| on: Markov chains are the original language models Until they've undergone preference tuning and RL post-training (which is expected to start using more training compute than next-token-prediction pre-training).