Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed. So much so I started two search engines, https://bonzamate.com.au/ which is explicitly for Australian sites and https://searchcode.com/ which is about to get a large index update and make it more useful.

I have often wondered if perhaps a distributed search engine is the real answer. Yacy was interesting but the results terrible. But perhaps it’s possible to use activitypub protocol to build a distributed search that has different implementations on the backend. It’s something I keep toying with in my spare time, and with what I did with bonzamate we’re the search is done in lambda functions on aws possible to roll out in a way that’s cheap for everyone to build out. The ability to federate between those providing good results, even at query time has the ability to be very powerful if implemented correctly.



I just finished reading this [0]. Your story-telling abilities are great. Your technical abilities as well. As a hobbyist search engine "engineer" doing independent research and slowly building on my search engine code base, I would like to subscribe to your newsletter.

I'd also like to mention that I'll be using [0] as a template for how to describe to the public my own process of researching and my though process for picking and choosing a particular piece of tech/data structure/solution.

Just a splendid read.

[0] https://boyter.org/posts/abusing-aws-to-make-a-search-engine...


Literally the nicest comment I have gotten in years. Thanks so much.

You can follow me on twitter, or use my blogs RSS, or email if you like. You can get all details on my profile :)


Searchcode looks great. Wish it had a recently indexed sorter.


I’m about to replace the index, and adding that is an often requested feature. So yes it’s coming soon.


> I have often wondered if perhaps a distributed search engine is the real answer.

Distribution seems like the exact opposite of being reliable? How would the network defend against bad actors?


The same way any other federated system does. Allow, ignore, block and such... The whole point is you get to choose. If someone happens to get a lot of requests from other peers they might be considered a reliable source.

However you would probably want to have them return ranking information allowing for re-ranking on the caller including its own system and others. Perhaps returning this could be optional?

Or even have the federated searches share portions of the index when peering others. This would allow one system to determine if the ranks appear in line with expectation. Allowing one to trust but verify.


> The same way any other federated system does. Allow, ignore, block and such...

Ah, “it would not”, sounds absolutely great.


What’s the stats for the Aussie search engine usage?


Honestly I have no idea. There is no tracking of any kind on it. I should probably hook one of those newer awstats implementations into the caddy logs for it to get some idea though.


You don’t need tracking to have usage information.


True. I just never bothered with anything. I get some basic details out of cloudwatch, but that's about it. As mentioned I think I should do something with what comes out of caddy one of these days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: