Kiwix - most people are too conditioned to think that search has to happen online and don't even realize what is possible offline.
Entire web archives such as the entire dump of wikipedia and stackexchange (including media and indexes
for search) can be stored locally. The missing piece is Google level search quality on the local machine. Given that brute force substring search can process Gigabytes in seconds nowadays. If you have enterprise grade server hardware things are reaching 1000GB/s. At this rate, there is no reason to think in a couple years local search of all known human knowledge can't happen on a local device at Google level result quality.
For anyone interested in the search space look into whats possible today in local offline search.
For the average person this rate does not matter. They don't need access to the cutting edge of quantum physics, astronomy, dance, art or javascript.
All you have to do is look at the speed at which new info is being added to Wikipedia and Stackoverflow which is stabilizing, i.e. it is not growing as it once was. Basic/foundational knowledge is more or less all covered. https://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia%...
And that sum total comes to 50-60 GB compressed. Think about that number. It's not big.
It would be awesome if you could download dumps of wikepedia filtered by category so You can get the size down. Probably a lot of information that is useless to me in there
You do not even need "Google level" for most of today's web users.
You can deliver what users need with respect to web search with much less than "Google level".
For example a simple "<title>" search. This is how Google started.
The entry point into the web should be search for domains. A "<title>" search can do that.
Most users today do not do much searching within websites via Google. They search for websites using Google.
Anyway, you are right about storage space and offline search but obviously that truth misaligns with the "cloud" business narrative and coaxing users to store all their personal data in datacenters instead of on their desk or in their pocket.
I'd say especially the average user profits from a search system that's somewhat clever and finds things even if they do not ask the exactly right query.
And searching for domains is only a tiny part of it, especially now where a lot of information is stuck in general sites with a lot of content (wikis, Q&A sites, social media sites) and not on special-interest sites. And for many generic searches the special-interest domains are various levels of spam/affiliate marketing.
PCIe 3 x16 devices have a 16GB/s theoretical max, so 1000GB/s is still out of reach for single machine I/O (though it's not as though search needs anywhere near these bandwidths anyway).
The Intel i9-7900x has 44 PCIe 3.0 lanes and wikipedia tells me each lane has throughput 984.6 MB/s so there's ~40 GB/s, maybe fast compression could make a small integer multiple.
Entire web archives such as the entire dump of wikipedia and stackexchange (including media and indexes for search) can be stored locally. The missing piece is Google level search quality on the local machine. Given that brute force substring search can process Gigabytes in seconds nowadays. If you have enterprise grade server hardware things are reaching 1000GB/s. At this rate, there is no reason to think in a couple years local search of all known human knowledge can't happen on a local device at Google level result quality.
For anyone interested in the search space look into whats possible today in local offline search.