Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a great way of framing it. It's my expectation that we will ruin the internet as a useful training corpus by flooding it with generated articles, and we will end up with a pre-AI date we use to filter incoming data in order to avoid them.

I wouldn't be surprised if filtering regular, pre LLM bot spam was already a massive hurdle when collating data for ChatGPT.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: