Which is exactly where the confusion is. The input is perfectly sane, it just isn't SQL or HTML, but perfectly sane plain text, which can be converted into perfectly sane HTML or perfectly sane SQL, but none of those is in any way "more sane", it's just the right format for a given use - if you were to put the plain text into a plain text email body, for example, you would not have to do any conversion at all.