Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would like to hear a little more on how you organized the search and what you are pre-processing and what you calculate on-the-fly. Thanks.


The database creation script[1] has a lot of Unix junk in it, but reading through the comments and echo statements should give you an idea of what it does. The end result is a SQLite database with a size of about 9 GB which has four tables, the schema of which are described in the README[2]. The big things that are precomputed are redirects are "auto-followed" to reduce the total graph size and all incoming and outgoing links are stored in a |-separated string for each page (in the `links` table).

Every time a query is made, a bi-directional breadth-first search[3] is run which uses the |-separated incoming and outgoing links and runs a fairly standard BFS algorithm. A lot of the hard work was precomputed, which minimizes the number of required database queries and makes each search respond fairly quickly.

[1] https://github.com/jwngr/sdow/blob/master/database/buildData... [2] https://github.com/jwngr/sdow#database-creation-process [3] https://github.com/jwngr/sdow/blob/master/sdow/breadth_first...


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: