I don't do HTML sanitization. I escape everything except [A-Za-z0-9,. ].
Attempts to sanitize HTML (or SQL...) by eliminating a set of "dangerous" inputs inevitably end up as whack-a-mole processes as people keep discovering new evil things to throw at you. The only secure solution is to escape everything except a small set of "safe" inputs -- this is analogous to the situation with packet filtering firewalls, where "default deny all" is widely accepted as the right way to do things.
Attempts to sanitize HTML (or SQL...) by eliminating a set of "dangerous" inputs inevitably end up as whack-a-mole processes as people keep discovering new evil things to throw at you. The only secure solution is to escape everything except a small set of "safe" inputs -- this is analogous to the situation with packet filtering firewalls, where "default deny all" is widely accepted as the right way to do things.