Re: Unicode points

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>  Date: 2005-02-21 17:48
>  From: Harald Tveit Alvestrand <harald@xxxxxxxxxxxxx>

> --On mandag, februar 21, 2005 13:20:54 -0500 Bruce Lilly <blilly@xxxxxxxxx> 
> wrote:
> 
> > Unicode code size increased overnight by more than 4
> > orders of magnitude (a factor of 65536) when it went from 16 bits
> > 65536 code points) to 32 bits (over 4 billion code points) at the
> > same time that it incorporated musical notation etc. in contradiction
> > to the Unicode Design Principles.
> 
> Bruce,
> 
> it may be nice to check your facts before you trot them out....

I apologize for not being sufficiently clear.
 
> at the moment (4.0.1), Unicode has approx. 96.000 codepoints, and is, 
> according to Unicode, "running out of scripts to encode".

Given the flip-flop on musical notation, I expect that the consortium
will have no trouble finding other non-text things to encode (smileys,
aromatic hydrocarbon chemical symbols (very fertile territory, no pun
intended), dance notation, logos, traffic symbols, etc.).
 
> The range of Unicode characters is defined in 
> <http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>, page 24, as 0 to 
> 10FFFF(hex), which is 1.114.111 decimal - quite a bit larger than 65536, 
> but quite a bit smaller than 4 billion.

Now, yes, but I have about as much faith that that won't expand as I
now have in the "Unicode characters are sixteen bits" statement which
was true in its day.

A related group of issues:
o 16-bit Unicode matched well with 16-bit wchar_t (I suspect not
  coincidentally, but I don't have evidence at my fingertips).
o the next size predefined data type beyond 16 bits (65536
  values) is 32 bits (4 billion+ values) in many programming
  languages
o while the raw data size doubles in going from 16 bits per character
  to 32 bits, the size of tables (normalization, etc.) indexed by
  character increases by more than 4 orders of magnitude. [yes,
  table compression can be used -- provided the locations and sizes
  of "holes" is guaranteed -- but that requires additional
  computational power] 

> And - if you want to call Unicode to task for violating its design 
> principles, it might be nice to say which principle you claim it violates, 
> and which one was violated by including musical notes, but not violated by 
> (say) Dingbats.

I did; in case you missed it, I quoted from the Unicode Standard
itself, viz. "Graphologies unrelated to text, such as musical and
dance notations, are outside the scope of the Unicode Standard".
That appeared in the description of the "plain text" principle,
before that sentence was elided following the abandonment of that
principle. 

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]