Re: Troubles with UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





--On 23. desember 2005 11:36 +0100 "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx> wrote:

A) Character set.  UTF-8 implicitly specifies the use of Unicode/IS10646
which contains 97,000 - and rising - characters.  Some (proposed)
standards limit themselves to 0000..007F, which is not at all
international, others to 0000-00FF, essentially Latin-1, which suits many
Western languages but is not truly international.  Is 97,000 really
appropriate or should there be a defined subset?

I think Ned has answered most of your other points... I'll chime in on this one.....

My opinion: ALL attempts at defining an "useful" character set of any size between 128 and "all you can eat" for use internationally have been dismal failures. They have been used in some niche, sooner or later there's a need to work outside that box, and gateways or other forms of self-torture result. (Alvestrand's equality: gateways = pain).

At the moment, the only reasonable candidate for an "all you can eat" character set is the Unicode charset. All other alternatives, including the bizarrely byzantine character set switching schemes of ISO 2022, are basically dead in the marketplace.

So there are only two real choices for charset left: ASCII and Unicode.

ASCII is unsuitable for any language except the technologists' simplified version of English. So if you want text, and want it to work internationally, there's only one choice left.

Subsets are a mistake.

                           Harald



Attachment: pgpfW0mnSHJPj.pgp
Description: PGP signature

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]