Re: When is "z" != "z" ?

Rasmus Lerdorf <rasmus@xxxxxxxxxxx> · Mon, 05 Jun 2006 18:54:11 -0700

tedd wrote:
For example, the Unicode issue was raised during this discussion -- if php doesn't consider the numeric relationship of characters, then I see a big problem waiting in the wings. Because if we're having these types of discussions with just considering 00-7F characters, then I can only guess at what's going to happen when we start considering 000000-FFFFFF code-points.

Now, was that enough said?  :-)

I don't think you really understand this.  < and > are collation 
operators when they operate on strings.  They have absolutely nothing to 
do with the numeric values of the characters.  It just so happens that 
in English iso-8859-1 there is a 1:1 relationship between the numeric 
values and the collation order, but you can think of that as dumb luck.

To better understand this, I suggest you start reading here:

  http://icu.sourceforge.net/userguide/Collate_Intro.html

Note one of the points on that page.  That in Lithuanian 'y' falls 
between 'i' and 'k'.  So even without going into Unicode and just using 
low-ascii, you have these issues.

Now, until we get to PHP 6, we don't have decent Unicode support and we 
don't have LOCALE-aware operators.  You will have to manually use 
strcoll() to get them, but that is going to change and you will have the 
ICU collation algorithms available and for Unicode strings it will be 
automatic.  You can still have binary-strings if you don't want 
locale-aware collation, of course.

-Rasmus

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php