On Thu, Aug 21, 2008 at 00:43, Dallas Clarke <DClarke@xxxxxxxxxxxxxx> wrote: > > Now I have had the time to pull myself off the ceiling, I realise the > problem is that Unix/GCC is supporting both UTF-8 and UTF-32, while Windows > is supporting UTF-8 and UTF-16. And the solution is for both Unix and > Windows to support all three Unicode formats. > Why is the solution to change Windows and GCC, rather than just use the UTF-8 that's apparently already in both? With combining codepoints, even UTF-32 is effectively a variable-length encoding (at the glyph level), so... > I hope your steering committee can see that there will be lots of UTF-16 > text files out there, with a lot of code required to be written to process > those files and while UTF-8 will not support many none Latin based > languages, UTF-32 will not support many none Human base languages - i.e. no > signal system is fault free. > Huh? It sounds like the later part of that claims that UTF-16 supports more languages than UTF-8 and UTF-32, which is clearly wrong. Though I've never seen the point in UTF-16 anyways. It can't be transported by things assuming 8-bit-clean ASCII anyways, and once compressed (as any significant amount would be) isn't usefully smaller than just using a fixed-length codepoint encoding.