Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is most apparent with older models like GPT-2, and more generally base ("text") models like Davinci. They are essentially only trained to predict the next word, based on what they've seen on the internet.

So they're a really fun way to try and "sample" the internet, by seeing the kind of outputs they produce.

--

A really fun thing I did with GPT-2 with a friend back when it came out, is we fine tuned the model on our WhatsApp conversation together.

(The model was small enough to do this on mid range GPU, in a few hours.)

So GPT-2 learned to simulate chats between us.

It was hilarious, but also weirdly illuminating.

My friend mostly talked about art (true), and didn't say much (true), and I rambled about how I finally figured out the One Weird Trick that's finally gonna turn my life around (...also true)...

The text itself was largely incoherent, but the "feel" (statistical properties?) were spot on.



Modern LLMs are also "essentially only trained to predict the next word, based on what they've seen on the internet."


Well, sort of! The new ones all "sound like AI." They're not great for writing, especially in terms of style and tone.

I'm not sure exactly how that works, but apparently instruct tuning produces mode collapse, whereas the older models are basically "raw unfiltered internet" (for better and worse).

I've been playing around with Davinci and it's remakable how much better the writing is.

You have to prompt it right, because all it does is continue the text. (E.g. if you ask it a question, it might respond with more questions.) But I think we really lost something valuable when we started neglecting the base/text models.


Until they've undergone preference tuning and RL post-training (which is expected to start using more training compute than next-token-prediction pre-training).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: