At this point, it's pretty clear that the AI scrapers won't be limited by any voluntary restrictions. Bytedance never seemed to live with robots.txt limitations, and I think at least some of the others didn't either.
- Humans tip humans as a lottery ticket for an experience (meet the creator) or sweepstakes (free stuff)
- Agents tip humans because they know they'll need original online content in the long-term to keep improving.
For the latter, frontier labs will need to fund their training/inference agents with a tipping jar.
There's no guarantee, but I can see it happening given where things are movin.
llms-txt may be useful for responsible LLMs, but I am skeptical that llms-txt will reduce the problem of aggressive crawlers. The problematic crawlers are already ignoring robots.txt, spoofing user-agents and using rotating proxies. I'm not sure how llms-txt would help these problems.
---
If this is of interest, I also recommend looking into: https://github.com/loderunner/scrt.
To me, it's a compliment to 1password.
I use it to save every new secret/api key I get via the CLI.
It's intentionally very feature limited.
Haven't tried it with agents, but wouldn't be surprised if the CLI (as is) would be enough.
reply