Re: Unicode points

"Mark Davis" <mark.davis@xxxxxxxxx> · Thu, 24 Feb 2005 15:48:12 -0800

One other item.

> o while the raw data size doubles in going from 16 bits per character
>   to 32 bits, the size of tables (normalization, etc.) indexed by
>   character increases by more than 4 orders of magnitude. [yes,
>   table compression can be used -- provided the locations and sizes
>   of "holes" is guaranteed -- but that requires additional
>   computational power]

That is irrelevant, since code points are limited to 0..0x10FFFF.

And, by the way, this is old news, since that change was made in Unicode 3.0
(1999). There are many different ways to adapt tables to work with large
numbers of code points -- they are hardly rocket science. One can choose
between structures tuned for performance and those tuned for footprint.

The Unicode Standard is not without its fair share of warts -- nothing would
be without them that had to deal with the complexity of all human languages,
and compatibililty with the legacy charsets that predated Unicode -- but
Bruce's points are incorrect. Taking a look at something that explains the
standard (Richard Gillam's book comes to mind) may help avoid spreading
future misinformation.

âMark

----- Original Message ----- 
From: "Peter Constable" <petercon@xxxxxxxxxxxxx>
To: <ietf@xxxxxxxx>
Sent: Thursday, February 24, 2005 15:15
Subject: Re: Unicode points

> From: Bruce Lilly <blilly@xxxxxxxxx>

> I apologize for not being sufficiently clear.

But part of the issue appears to be one of being sufficiently informed.

> Given the flip-flop on musical notation, I expect that the consortium
> will have no trouble finding other non-text things to encode (smileys,
> aromatic hydrocarbon chemical symbols (very fertile territory, no pun
> intended), dance notation, logos, traffic symbols, etc.).

There has been no flip-flop on such things. There were never any
guarantees that musical symbols would not be part of the UCS. There will
be further symbols added to the UCS, and there is no certainty of
exactly what, but it is by no means open-ended.

> > The range of Unicode characters is defined in
> > <http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf>, page 24, as
0 to
> > 10FFFF(hex), which is 1.114.111 decimal - quite a bit larger than
65536,
> > but quite a bit smaller than 4 billion.
>
> Now, yes, but I have about as much faith that that won't expand as I
> now have in the "Unicode characters are sixteen bits" statement which
> was true in its day.

That merely indicates that you are not fully informed with the
development of Unicode and ISO 10646. Nothing whatsoever has happened to
increase the likelihood of the codespace ever going beyond U+10FFFF.
Rather, much has been done to ensure that it does not, as reflected by
recent action within JTC1/SC2/WG2 to that effect.

> I did; in case you missed it, I quoted from the Unicode Standard
> itself, viz. "Graphologies unrelated to text, such as musical and
> dance notations, are outside the scope of the Unicode Standard".

That means that musical or dance notation and complete notational
systems are beyond the scope of the Unicode Standard itself, as are
mathematical formulas. That does *not* mean that the text elements --
the individual symbols -- that are used in those notation systems are
necessarily out of scope for the Unicode Standard.

> That appeared in the description of the "plain text" principle,
> before that sentence was elided following the abandonment of that
> principle.

You seem to think that some principle has been abandoned, but it has
not.

Peter Constable

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf