Jim Cobban wrote: > Furthermore UTF-8 and UTF-16 should have nothing to do with the > internals of the representation of strings inside a C++ > program. It is obviously convenient that a wchar_t * or > std::wstring should contain one "word" for each external glyph, > which is not true for either UTF-8 or UTF-16. UTF-8 and UTF-16 > are standards for the external representation of text for > transmission between applications, and in particular for > writing files used to carry international text. For example > UTF-8 is clearly a desirable format for the representation of > C/C++ programs themselves, because so many of the characters > used in the language are limited to the ASCII code set, which > requires only 8 bits to represent in UTF-8. Just in case anyone thinks that UTF-16 might be a good format for saving data in files or for data to be sent over a network, here's a gem from Microsoft: 'The example in the documentation didn't specify Little Endian, so the Unicode string that the code generates is Big Endian. The SQL Server Driver for PHP expected Big Endian, so the data written to SQL Server is not what was expected. However, because the code to retrieve the data converts the string from Big Endian back to UTF-8, the resulting string in the example matches the original string. 'If you change the Unicode charset in the example from "UTF-16" to "UCS-2LE" or "UTF-16LE" in both calls to iconv, you'll still see the original and resulting strings match but now you'll also see that the code sends the expected data to the database.' http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=3644735&SiteID=1 Andrew.