Well at that point, who really cares if the content of the 1001s page is determi...

kevincox · on Oct 3, 2022

You still need to filter based on the other indexes. If you search for [bitcoin mining] you don't want to find pages related to coal mining. So this data still needs to be joined.

dekhn · on Oct 3, 2022

the search term for this is intersection. The posting lists for the two terms are intersected, then the results are ranked. But there are a lot more steps in a production search engine.

The long and short of it is if you really want the full results, just join google, join the search team, and then get enough experience so that you can do full queries over the docjoins directly. This was part of Norvig's pitch to attract researchers a while ago. For a research project, I built a regular expression that matched DNA sequences and spat out the list of all pages containing what looked like DNA and then annotated the pages so in principle you could have done dna:<whatever sequence> but obviously that was not a goal for the search team.