Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Given that HellaSwag performance seems to correlate with reasoning ability more than other benchmarks, Falcon certainly look promising! Hopefully this is a clean result and not the product of dataset contamination.


I've given it a try, to having a chat is good, to follow langchain prompts it's not.

I guess it depends on the type of work you want to extract from it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: