Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a simple website consisting solely of static webpages pointing to a bunch of .zip binaries. Nothing dynamic, all highly cacheable. The bots are re-downloading the binaries over and over. I can see Bingbot downloading a .zip file in the logs, and then an hour later another Bingbot instance from a different IP in the same IP range downloading the same .zip file in full. These are files that were uploaded years ago and have never retroactively changed, and don't contain crawlable contents within them (executable code).

Web crawlers have been around for years, but many of the current ones are more indiscriminate and less well behaved.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: