Google it a few more times, google will start feeding it back to people who will also google it in disbelief; two years from now they're using it in NYT headlines.
If you worked on Google's crawl scheduling, HN would be one of the sites you used to test out ideas for better scheduling heuristics, right?
I worked in indexing over a decade ago, but back then, after some basic constraints (per-IP rate limiting, don't re-check any page for updates too often, don't wait a crazy long time before re-checking any page, etc.), it was a bunch of arcane black magic heuristics to schedule pages for crawling.
These days, I imagine they have one ML model for the expected time until a given page shows up on the first page of search results for some query, another ML model for guessing how much the page has changed (cosine distance of some semantic embedding or something), and schedule based on the product of the two estimates. It's still probably lots of black magic heuristics, just now it's probably heuristics nobody can read.
Can you give an example of "arcane black magic heuristics"? I cannot imagine, what weird rules you could even come up with, aside from the normal "sensible" ones you already listed.
I was on a different team, but my third-hand knowledge from 20 years ago (notably, before deep learning became mainstream, even at Google) was that Google crawl scheduling had a bunch of heuristics to guess at the update frequency of a given page. The probability that a page has changed since you last crawled it is an important factor in the expected utility of crawling it now.
As I mentioned, I expect that even more arcane heuristics, in the form of ML models, has largely or completely replaced the hand-written heuristics.