I googled "cactodes" and this comment was the 7th result on the first page.

ant6n · on Nov 10, 2022

7th result on the first page? Sounds like an authoritative source to me!

pessimizer · on Nov 10, 2022

Google it a few more times, google will start feeding it back to people who will also google it in disbelief; two years from now they're using it in NYT headlines.

lostlogin · on Nov 10, 2022

It’s 6th for me, and no adverts. How does this get monetised?

orangepurple · on Nov 10, 2022

Something something pr0n

dredmorbius · on Nov 11, 2022

Paige, no!

CalRobert · on Nov 10, 2022

Poor DuckDuckGo doesn't even have it on page 1.

oliwary · on Nov 11, 2022

Does not show up for me at all in results, interesting. I am outside, wonder if they have geographically separated cashes.

valarauko · on Nov 10, 2022

8th for me on Kagi

ch4s3 · on Nov 10, 2022

It's 6th now.

knaekhoved · on Nov 10, 2022

Impressive indexing speed.

KMag · on Nov 11, 2022

If you worked on Google's crawl scheduling, HN would be one of the sites you used to test out ideas for better scheduling heuristics, right?

I worked in indexing over a decade ago, but back then, after some basic constraints (per-IP rate limiting, don't re-check any page for updates too often, don't wait a crazy long time before re-checking any page, etc.), it was a bunch of arcane black magic heuristics to schedule pages for crawling.

These days, I imagine they have one ML model for the expected time until a given page shows up on the first page of search results for some query, another ML model for guessing how much the page has changed (cosine distance of some semantic embedding or something), and schedule based on the product of the two estimates. It's still probably lots of black magic heuristics, just now it's probably heuristics nobody can read.

krick · on Nov 11, 2022

Can you give an example of "arcane black magic heuristics"? I cannot imagine, what weird rules you could even come up with, aside from the normal "sensible" ones you already listed.

KMag · on Nov 23, 2022

I was on a different team, but my third-hand knowledge from 20 years ago (notably, before deep learning became mainstream, even at Google) was that Google crawl scheduling had a bunch of heuristics to guess at the update frequency of a given page. The probability that a page has changed since you last crawled it is an important factor in the expected utility of crawling it now.

As I mentioned, I expect that even more arcane heuristics, in the form of ML models, has largely or completely replaced the hand-written heuristics.