Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Don't have any index pages or heavy cross-linking between pages.




None of that matters. AI bots can still figure out how to navigate the website.

The biggest problem I have seen with AI scrapping is that they blindly try every possible combination of URLs once they find your site and blast it 100 times per second for each page they can find.

They don’t respect robots.txt, they don’t care about your sitemap, they don’t bother caching, just mindlessly churning away effectively a DDOS.

Google at least played nice.

And so that is why things like anubis exist, why people flock to cloudflare and all the other tried and true methods to block bots.


I don't see how that is possible. The web site is a disconnected graph with a lot of components. If they get hold of a url, maybe that gets them to a few other pages, but not all of them. Most of the pages on my personal site are .txt files with no outbound links, for that matter. Nothing to navigate.

how? if you don't have a default page and index listings are disabled, how can they derive page names?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: