Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

UTF-8 is not efficient for random access.

I don't have problem with UTF-8. I have problem with the silver bullet attitude advocating using an approach for all cases without thought. That's just intellectually lazy.



No encoding that can handle all the necessary languages will be efficient for random access.

I'm not saying don't think about it. But once you think about it, I think there's really only one sane conclusion to reach.


Never say never. UTF-32 handles them just fine.


Precomposed versus decomposed accents? Jamo versus precomposed Hangul characters? The Unicode code point is rarely useful thing to know about on its own, and code which assumes that one code point equals one "character", for whatever definition of a character is in use, is likely to work poorly with UTF-32.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: