Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

BeautifulSoup is in fact still actively maintained. “The current release is Beautiful Soup 4.1.3 (August 20, 2012).”

http://www.crummy.com/software/BeautifulSoup/

I hear it recommended the most among Pythonistas, and it's plenty clean and fast for my use. But if you're skeptical, I'd still look for a more up to date benchmark (or run your own) rather than rely on results from >4 years ago.



Looks like things have changed since the last time I checked. Thank you for pointing this out. Next time I'll check y facts twice before posting.

Still, lxml being basically a binding to libxml2 the performance comparison of the two libs should still hold. I heard it recommended too, in a python talk about scraping like 1 or 2 (at most) years ago.

BeautifulSoup may still be better for parsing broken documents, though I never had problems with lxml while using it on a very large variety of sites.


You can use BeautifulSoup with lxml if you like, although I just use the HTMLParser in lxml these days and don't use BeautifulSoup any more. It seems to work a little better, at least for my uses.

http://lxml.de/elementsoup.html




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: