Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.

Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: