The biggest problem I have seen with AI scrapping is that they blindly try every possible combination of URLs once they find your site and blast it 100 times per second for each page they can find.
They don’t respect robots.txt, they don’t care about your sitemap, they don’t bother caching, just mindlessly churning away effectively a DDOS.
Google at least played nice.
And so that is why things like anubis exist, why people flock to cloudflare and all the other tried and true methods to block bots.
I don't see how that is possible. The web site is a disconnected graph with a lot of components. If they get hold of a url, maybe that gets them to a few other pages, but not all of them. Most of the pages on my personal site are .txt files with no outbound links, for that matter. Nothing to navigate.