Re: UTF-8, UTF-16 and UTF-32

me22 <me22.ca@xxxxxxxxx> · Thu, 21 Aug 2008 01:15:59 -0400

On Thu, Aug 21, 2008 at 00:43, Dallas Clarke <DClarke@xxxxxxxxxxxxxx> wrote:
>
> Now I have had the time to pull myself off the ceiling, I realise the
> problem is that Unix/GCC is supporting both UTF-8 and UTF-32, while Windows
> is supporting UTF-8 and UTF-16. And the solution is for both Unix and
> Windows to support all three Unicode formats.
>

Why is the solution to change Windows and GCC, rather than just use
the UTF-8 that's apparently already in both?  With combining
codepoints, even UTF-32 is effectively a variable-length encoding (at
the glyph level), so...

> I hope your steering committee can see that there will be lots of UTF-16
> text files out there, with a lot of code required to be written to process
> those files and while UTF-8 will not support many none Latin based
> languages, UTF-32 will not support many none Human base languages - i.e. no
> signal system is fault free.
>

Huh?  It sounds like the later part of that claims that UTF-16
supports more languages than UTF-8 and UTF-32, which is clearly wrong.

Though I've never seen the point in UTF-16 anyways.  It can't be
transported by things assuming 8-bit-clean ASCII anyways, and once
compressed (as any significant amount would be) isn't usefully smaller
than just using a fixed-length codepoint encoding.