A McDonald's kiosk is a masterpiece of engineering perfectly designed to make as much money as possible. It's by no means lazily made or without afterthoughts or care. Every detail in the interface has was decided after tons of experiments and hours of meetings.
Do you have an example query? Quotes for literal words work for me. Domain search was always with with site:domain.com and not quotes and works fine. I think bing/ddg removed that feature recently?
She seems to assume that federations are supposed to be small groups where everyone knows each other. Has anyone claimed that?
Email and the web are the perfect examplea of federations, because they are not controlled by a single company; they are interoperable. From Gmail I can send an email to a friend who uses Outlook. That way I'm not forced to use outlook if I want to communicate with him and I can use any email provider I want. The fact that Google and Microsoft are trillion dollar companies is irrelevant here.
Same with the web, I can change my ISP and keep accessing the exact same webpages as before because the web is federated, and not controlled by my ISP. The fact that I use it to visit pages controlled by other large companies is irrelevant.
Facebook or Whatsapp, on the other hand, are not federated. If I want to communicate with my friends who use Facebook or Whatsapp I have no other choice than to use Facebook and Whatsapp.
Any simple heuristic has false positives, meaning they'll end up taking down legitimate sites that had repeated content for a good reason.
Say, for example two sites quoting text from the us constitution. The second one to be crawled would be considered to be spam copying the first one and removed from web results. Then you'll get comments on hacker news complaining that Google is censoring it for political reasons.
And any simple heuristic is quickly reverse engineered by SEOs, who will find a way to mask it as legitimate.
They could use the heuristics to build a list of domains to block and then have someone review it. After doing it for a long time, they could build a neural model on top of that, and automate it.
As I have said, the reason they don't do it is not because they don't have the skills and know-how.
There are billions of dollars on stake from both sides, search engines and spammers, in an endless arms race that has been going on for more than 20 years.
Trust me, it's beyond naive to say fighting webspam is a low hanging fruit problem.
Why should I trust you? I trust my own eyes. I regularly see spam sites that get to the top of results that are seen by many people for months. These could be filtered with a one line change.
Thanks for the reply! It's actually up and running, but if the response says "forbidden" it's likely because I blanket-block a lot of non-US IPs, AWS IP ranges, etc. because of annoying crawlers. This is bad practice, but I do it for several reasons. I've turned off some blanket-blocking for now.
If you see this comment, would you mind sharing if you were making a request from a US-based IP, VPN, or outside the US? Just curious - it'll help me understand things a bit better.
They do care a lot, but about the wrong thing.