Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you buy, or pirate, a book, you didn't enter into a business relationship with the author specifically forbidding you from using the text to train models. When you get tokens from one of these providers, you sort of did.

I think it's a pretty weak distinction and by separating the concerns, having a company that collects a corpus and then "illegally" sells it for training, you can pretty much exactly reproduce the acquire-books-and-train-on-them scenario, but in the simplest case, the EULA does actually make it slightly different.

Like, if a publisher pays an author to write a book, with the contract specifically saying they're not allowed to train on that text, and then they train on it anyway, that's clearly worse than someone just buying a book and training on it, right?

 help



> When you buy, or pirate, a book, you didn't enter into a business relationship with the author specifically forbidding you from using the text to train models.

Nice phrasing, using "pirate".

Violating the TOS of an LLM is the equivalent of pirating a book.


Contracts can't exclude things that weren't invented when the contracts were written.

Ultimately it's up to legislation to formalize rules, ideally based on principles of fairness. Is it fair in non-legalistic sense for all old books to be trainable-on, but not LLM outputs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: