Dallas Clarke wrote: > Now I have had the time to pull myself off the ceiling, I realise the > problem is that Unix/GCC is supporting both UTF-8 and UTF-32, while > Windows is supporting UTF-8 and UTF-16. And the solution is for both > Unix and Windows to support all three Unicode formats. > > I have had to spend the last several days totally writing from scratch > the UTF-16 string functions, and realise that with a bit of common sense > every thing can work out okay. Hopefully quick action to move wchar_t to > 2 bytes and create another type for 4 byte strings, we can see a lot of > problems solved. Maybe have UTF-16 strings with L"My String" and UTF-32 > with LL"My String" notations. Changing wchar_t would break the ABI. It isn't going to happen. > I hope your steering committee can see that there will be lots of UTF-16 > text files out there, with a lot of code required to be written to > process those files and while UTF-8 will not support many none Latin > based languages, UTF-32 will not support many none Human base languages > - i.e. no signal system is fault free. I don't think that such a change can be decreed by the GCC SC. I don't understand your claim that "UTF-8 will not support many none Latin based languages". UTF-8 <http://tools.ietf.org/html/rfc3629> supports everything from U+0000 to U+10FFFF. While programs use a variety of internal representations of characters, successful transmission of data between machines requires a common interchange format, and UTF-8 is that format. Andrew.