Unicode points (Re: IDN security violation? Please comment)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





--On mandag, februar 21, 2005 13:20:54 -0500 Bruce Lilly <blilly@xxxxxxxxx> wrote:

Unicode code size increased overnight by more than 4
orders of magnitude (a factor of 65536) when it went from 16 bits
65536 code points) to 32 bits (over 4 billion code points) at the
same time that it incorporated musical notation etc. in contradiction
to the Unicode Design Principles.

Bruce,

it may be nice to check your facts before you trot them out....

at the moment (4.0.1), Unicode has approx. 96.000 codepoints, and is, according to Unicode, "running out of scripts to encode".

The range of Unicode characters is defined in <http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>, page 24, as 0 to 10FFFF(hex), which is 1.114.111 decimal - quite a bit larger than 65536, but quite a bit smaller than 4 billion.

(In my personal opinion, the 16-bit limit was a stupid one in the first place - it's been clear for a long time that 65.536 characters would not be enough to encode the Han characters. Which is why I never believed in UCS-2 as a rational design point.)

And - if you want to call Unicode to task for violating its design principles, it might be nice to say which principle you claim it violates, and which one was violated by including musical notes, but not violated by (say) Dingbats.

Harald, who happens to be a board member of the Unicode consortium (but does not at all speak for the consortium)


_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]