Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, that's really the mystery of it. One of the most notable advances of recent LLMs has been emergent, one-shot, and highly-contextually-aware behaviors that seem to only manifest in models with extremely large numbers of parameters.

Cleary the latent space of the model is able to encode some sort of reasoning around how time flows, typical information about houses, understanding how to transform and format information in a JSON snippet, etc. That's the "magic" of it all; amazing and powerful emergent behaviors from billions of weights.

Of course, that's also the limitation. They're opaque and incredibly difficult (impossible?) to inspect and reason about.



“seem to only manifest in models with extremely large numbers of parameters”

You can do the same with models trained on your laptop in a few seconds. The trick here is to let the model have attention to what is learned, and can also be used on images and other type of data.

The benefit of a lot of parameters is more related to training in parallel, faster and “remember” more from the data.


Thanks for the pointer.

Any idea of how clever the wrapper around these things is? For example, would OP’s use case simply get forwarded to the neural network as one single input (string of words) or is there some clever preprocessing going on.


The only significant bit of preprocessing by the language model is tokenization. You can see how it works here: https://beta.openai.com/tokenizer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: