There's some established HTML sanitization solutions like HTML Purifier in PHP, but I can't seem to find their counterparts in either Java or .Net. Code samples in C# and Java are floating around, but I haven't seen anything like a project that's undergone serious testing and is being updated for new security threats on an ongoing basis.
Heck, I'd pay for a commercial component if it was well maintained. Any suggestions? Or, this being Hacker News - would someone want to create such a component as a mini-ISV startup?
Attempts to sanitize HTML (or SQL...) by eliminating a set of "dangerous" inputs inevitably end up as whack-a-mole processes as people keep discovering new evil things to throw at you. The only secure solution is to escape everything except a small set of "safe" inputs -- this is analogous to the situation with packet filtering firewalls, where "default deny all" is widely accepted as the right way to do things.