Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And I think this is a common problem actually — figuring out what to measure and how to measure it – it's not black and white. What I do is have a few dimensions to measure it against (this may or may not fit your use case): relevance, instruction following, clarity, hallucination rate, etc. but even then, it becomes hard to measure things like 'clarity'.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: