Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would the following be a correct way to determine whether there's a problem?

* First call setlocale(LC_CTYPE, "en_US.UTF-8")

* Next feed the UTF-8 string representation of every Unicode codepoint one at a time to mbstowcs() and ensure that the output for each is a wchar_t string of length one

* If all input codepoints numerically match the output wchar_t UTF-32 code units, then the implementation is officially good, and should define __STDC_ISO_10646__?



I think this is correct, assuming that locale is supported by the implementation and wchar_t is wide enough, but I am by no means an expert on character encodings.


Should work provided your wchar_t type is at least 21-bits wide.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: