UTF-8 is not efficient for random access. I don't have problem with UTF-8. I hav...

mikeash · on April 13, 2012

No encoding that can handle all the necessary languages will be efficient for random access.

I'm not saying don't think about it. But once you think about it, I think there's really only one sane conclusion to reach.

ww520 · on April 13, 2012

Never say never. UTF-32 handles them just fine.

mikeash · on April 13, 2012

Precomposed versus decomposed accents? Jamo versus precomposed Hangul characters? The Unicode code point is rarely useful thing to know about on its own, and code which assumes that one code point equals one "character", for whatever definition of a character is in use, is likely to work poorly with UTF-32.