Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What do you use for HTML sanitization?
3 points by waldrews on Aug 16, 2008 | hide | past | favorite | 4 comments
There's some established HTML sanitization solutions like HTML Purifier in PHP, but I can't seem to find their counterparts in either Java or .Net. Code samples in C# and Java are floating around, but I haven't seen anything like a project that's undergone serious testing and is being updated for new security threats on an ongoing basis.

Heck, I'd pay for a commercial component if it was well maintained. Any suggestions? Or, this being Hacker News - would someone want to create such a component as a mini-ISV startup?



I don't do HTML sanitization. I escape everything except [A-Za-z0-9,. ].

Attempts to sanitize HTML (or SQL...) by eliminating a set of "dangerous" inputs inevitably end up as whack-a-mole processes as people keep discovering new evil things to throw at you. The only secure solution is to escape everything except a small set of "safe" inputs -- this is analogous to the situation with packet filtering firewalls, where "default deny all" is widely accepted as the right way to do things.


Escaping is important for all text fields, and I never use user text to build SQL strings without using some kind of typed parameter API.

But I'd like to let the user submit formatted HTML via something like FCKEditor and have it checked against a limited whitelist.

I don't trust my own knowledge of browser-specific HTML rules to do this on my own. It's got to be a common enough scenario, right?


Whitelisting.


I use PeterBlum.com's controls for ASP.NET plus some hand-rolled stuff.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: