Every ai labs train on the test set. That is a big part of why we see benchmark ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		retinaros 34 days ago \| parent \| context \| favorite \| on: Exploiting the most prominent AI agent benchmarks Every ai labs train on the test set. That is a big part of why we see benchmark climbing from 1% to 30% after a few models iterations

latentsea 34 days ago [–]

Models themselves definitely aren't getting better.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact