Re: languages and PHP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



tedd wrote:
> Isn't UTF-8 the big fish here?
> 
> Sure there' UTF-16 and larger, but everything else is a subset of UTF-8,
> is it not?
> 
> So, what's the problem if you get a character defined by ISO -- it's
> still within the UTF-8 super-group, right?

Individual characters are sometimes OK, but it's the sequence of
characters that could be invalid.

UTF-8 works by using special bits at the MSB end of the byte to say, "I
can't represent this character in one byte, I need to use 2 bytes (or 3
bytes)" (and maybe also 4? can't remember of the top of my head).

In a multi-byte sequence the MSB end of all the bytes must follow a
pre-defined scheme. If they do not they are syntactically invalid UTF-8.

So it's more than just individual characters, the order of them is
important.

Hope that explains it (although probably a bad explanation as I'm very
tired right now!).

Col

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux