Re: languages and PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 11:09 AM +0100 10/2/07, Colin Guthrie wrote:
tedd wrote:
 Isn't UTF-8 the big fish here?

 Sure there' UTF-16 and larger, but everything else is a subset of UTF-8,
 is it not?

 So, what's the problem if you get a character defined by ISO -- it's
 still within the UTF-8 super-group, right?

Individual characters are sometimes OK, but it's the sequence of
characters that could be invalid.

UTF-8 works by using special bits at the MSB end of the byte to say, "I
can't represent this character in one byte, I need to use 2 bytes (or 3
bytes)" (and maybe also 4? can't remember of the top of my head).

In a multi-byte sequence the MSB end of all the bytes must follow a
pre-defined scheme. If they do not they are syntactically invalid UTF-8.

So it's more than just individual characters, the order of them is
important.

Hope that explains it (although probably a bad explanation as I'm very
tired right now!).

Col


Ah, I see what you're saying. I've run into that before when studying Unicode. The mb_ series of functions deal with larger than ASCII coding, but I don't know of any that deals with character sequence/combinations or right/left readings. That's all Greek to me, pardon the pun.

Cheers,

tedd

--
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux