Re: Convert UTF-8 to PHP defines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 8:52 PM +0200 5/28/10, Nisse =?utf-8?Q?Engstr=C3=B6m?= wrote:
On Fri, 28 May 2010 11:13:35 -0400, tedd wrote:

 > As is my understanding, UTF-8 will accommodate all the languages
 (glyphs) of the world and then some. It will be a while before we
 need UTF-16 or UTF-32 but those are just a larger super-sets.

*blink*

They are all capable of representing the full Unicode
range, which is restricted to U+0000 - U+10ffff.

The theoretical limits are:

  UTF-8   [0 - 7fffffff]
  UTF-16  [0 -   10ffff]
  UTF-32  [0 - ffffffff]

Also, there are many, many, *many* more glyphs than
characters (code point) in the world. As an example,
www.fonts.com lists 165,125 fonts. Every one has a
*different* glyph for the characer "A"...

/Nisse

*blink* *blink*

As you say, UTF-8 has a range of 0 to 7FFFFFFF

Forgive me, but isn't that 2,147,483,647 (DEC) code points?

Please note that 165,125 * 48 (upper/lower case) is only 7,925,952 code points -- IF -- each letter of each font was to have it's own code point, which is not the case for Unicode.

Code points are assigned to specific char sets that belong to specific language sets, such as English being assigned to the code point range that is common with ASCII. From that, we can have as many fonts as your software can handle. However, ASCII 65 DEC (41 HEX) or code point 65 (41 HEX) is still tied to the letter "A" regardless of if it is Helvetical or Times. So, don't confuse code points with fonts.

If you spend some time looking at the numerous char sets that Unicode offers you will see that just about every symbol known to man has been cataloged -- even Klingon was considered. From Dingbats to Architectural symbols, from simplified Chinese to traditional Chinese, from Greek to Cherokee, from skull/cross-bones to yin/yang symbol, every language in the world and glyph known to man has been included -- a truly massive project.

IMO, it will be a while before we use up all the range Unicode code points provides.

Cheers,

tedd

--
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux