Re: Convert UTF-8 to PHP defines

Nisse Engström <news.NOSPAM.0ixbtqKe@xxxxxxxx> · Fri, 28 May 2010 20:52:04 +0200

On Fri, 28 May 2010 11:13:35 -0400, tedd wrote:

> Bob wrtote:
> 
>>>The real question is whether unicode is even relevant now that the UTF
>>>series is available.
> 
> Ashley answered:
> 
>>Bob, UTF is unicode (Unicode Transformation Format)

Or more precisely, UTF-{8,16,32} are different ways to
serialize Unicode code points into sequences of octets
that makes it possible to store and transmit Unicode
data.

> Yes, Ashley is correct. UTF-8 is Unicode, as is UTF-16 and UTF-32, 
> which all use different a number of bytes for each code point. Both 
> UTF-8 and UTF-16 are variable length whereas UTF-32 is a fixed length 
> of four bytes per code point.
> 
> As is my understanding, UTF-8 will accommodate all the languages 
> (glyphs) of the world and then some. It will be a while before we 
> need UTF-16 or UTF-32 but those are just a larger super-sets.

*blink*

They are all capable of representing the full Unicode
range, which is restricted to U+0000 - U+10ffff.

The theoretical limits are:

  UTF-8   [0 - 7fffffff]
  UTF-16  [0 -   10ffff]
  UTF-32  [0 - ffffffff]

Also, there are many, many, *many* more glyphs than
characters (code point) in the world. As an example,
www.fonts.com lists 165,125 fonts. Every one has a
*different* glyph for the characer "A"...

/Nisse

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php