Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Everyone accepts output from LLMs is largely predicated on grounding them, but few seem to be realizing that grounding them applies to more than raw data.

They perform better at many tasks simply by grounding their alignment in-context, by telling them very specific people to act as.

It's an example of something that "prompt engineering" solves today and people only glancingly familiar with how LLMs work insist won't be needed soon... by their very nature the models will always have this limitation.

Say user A is an expert with 10 years of experience and user B is an beginner with 1 year of experience: they both enter a question and all the model has to go on is the tokens in the question.

The model might have uncountable ways to reply to that question if you had inserted more tokens, but with only the question in context, you'll always get answers that are clustered around the mean answer it can produce... but because it's the literal mean of all those possibilities it's unlikely user A or user B will find particularly great.

Because of that there's no way to ever produce an answer that satisfies both A and B to the full capabilities of that LLM. When the input is just the question you're not even touching the tip of the iceberg of knowledge it could have distilled into a good answer. And so just as you're finding that Claude's push back and advice is useful, someone will say it's more finicky and frustrating than GPT 3.5.

It mostly boils down to the fact because groups of user aren't really defined by the mean. No one is the average of all developers in terms of understanding (if anything that'd make you an exceptional developer) instead people are clustered around various levels of understanding in very complex ways.

-

With that in mind, instead of banking on the alignment and training data of a given model happening to make the answer to that question good for you, you can trivially "ground" the model and tell it you're a senior developer speaking frankly with your coworker who's open to push back and realizes you might have the X/Y problem and other similar fallacies.

You can remind it that it's allowed unsure, or it's very sure, you can even ask it to list gaps in it's abilities (or yours!) that are most relevant to a useful response.

That's why hearing model X can't do Y but model Z doesn't really passes muster for me at this point unless how Y was inputted into the model is shared.



> The model might have uncountable ways to reply to that question if you had inserted more tokens, but with only the question in context, you'll always get answers that are clustered around the mean answer it can produce... but because it's the literal mean of all those possibilities it's unlikely user A or user B will find particularly great.

I refer to it as giving the LLM "pedagogical context" since a core part of teaching is predicting what kind of answer will actually help the audience depending on surrounding context. The question "What is multiplication?" demands a vastly different answer in an elementary school than a university set theory class.

I think that's why there's such a large variance in HNer's experience with ChatGPT. The GPT API with a custom system prompt is far more powerful than the ChatGPT interface specifically because it grounds the conversation in the way that the moderated ChatGPT system prompt can't.

The chat GUI I created for my own use has a ton of different roles that I choose based on what I'm asking. For example, when discussing cuisine I have roles like (shortened and simplified) "Julia Childs talking to a layman who cares about classic technique", "expert molecular gastronomy chef teaching a culinary school student", etc.


Exactly, you can’t treat these systems as a singular entity, you conjure the expert you need for the task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: