tedd wrote: > Isn't UTF-8 the big fish here? > > Sure there' UTF-16 and larger, but everything else is a subset of UTF-8, > is it not? > > So, what's the problem if you get a character defined by ISO -- it's > still within the UTF-8 super-group, right? Individual characters are sometimes OK, but it's the sequence of characters that could be invalid. UTF-8 works by using special bits at the MSB end of the byte to say, "I can't represent this character in one byte, I need to use 2 bytes (or 3 bytes)" (and maybe also 4? can't remember of the top of my head). In a multi-byte sequence the MSB end of all the bytes must follow a pre-defined scheme. If they do not they are syntactically invalid UTF-8. So it's more than just individual characters, the order of them is important. Hope that explains it (although probably a bad explanation as I'm very tired right now!). Col -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php