Unicode points (Re: IDN security violation? Please comment)

Harald Tveit Alvestrand <harald@xxxxxxxxxxxxx> · Mon, 21 Feb 2005 23:48:37 +0100

--On mandag, februar 21, 2005 13:20:54 -0500 Bruce Lilly <blilly@xxxxxxxxx> 
wrote:

Unicode code size increased overnight by more than 4
orders of magnitude (a factor of 65536) when it went from 16 bits
65536 code points) to 32 bits (over 4 billion code points) at the
same time that it incorporated musical notation etc. in contradiction
to the Unicode Design Principles.

Bruce,

it may be nice to check your facts before you trot them out....

at the moment (4.0.1), Unicode has approx. 96.000 codepoints, and is, 
according to Unicode, "running out of scripts to encode".

The range of Unicode characters is defined in 
<http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>, page 24, as 0 to 
10FFFF(hex), which is 1.114.111 decimal - quite a bit larger than 65536, 
but quite a bit smaller than 4 billion.

(In my personal opinion, the 16-bit limit was a stupid one in the first 
place - it's been clear for a long time that 65.536 characters would not be 
enough to encode the Han characters. Which is why I never believed in UCS-2 
as a rational design point.)

And - if you want to call Unicode to task for violating its design 
principles, it might be nice to say which principle you claim it violates, 
and which one was violated by including musical notes, but not violated by 
(say) Dingbats.

Harald, who happens to be a board member of the Unicode consortium (but 
does not at all speak for the consortium)

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf