Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ProPublica recently published an article (with an unseemly title) about the nightmare of scraping data from pdfs and putting it into tables. [1] They wrote some custom software to do the scraping. At the end of the article they mention some software released recently by a Knight-Mozilla OpenNews Fellow, Manuel AristarĂ¡n, called Tabula, which is a PDF data table scaper.[2] I haven't tested it yet.

[1] http://www.propublica.org/nerds/item/heart-of-nerd-darkness-...

[2] http://source.mozillaopennews.org/en-US/articles/introducing...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: