Re: I lied, another question / problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



# neuhauser@xxxxxxxxxx / 2007-01-21 10:48:30 +0000:
> # jochem@xxxxxxxxxxxxx / 2007-01-21 00:11:13 +0100:
> > Roman Neuhauser wrote:
> > > # jochem@xxxxxxxxxxxxx / 2007-01-17 16:59:26 +0100:
> > >> wouldn't it be fair to assume (safety through paranoia) that
> > >> ctype_alnum() would suffer the same problem? (given the manual's
> > >> indication that ctype_alnum() and the offending regexp are equivalent?)
> > > 
> > > isalnum(3) uses isalpha(3) and isdigit(3), so yes, their results are
> > > locale-dependent (LC_CTYPE, see setlocale(3)), but don't depend on
> > > collating sequence. 
> > 
> > so really the doc's are slightly misleading or even incorrect,
> 
> Slightly, in a usually-behaves-as-described-but-for-different-reasons
> way.
> 
> > as a side note: do you have any real world example of where this
> > collation issue might actually bite someone making use of the aforementioned
> > regexp range?
> 
> Not off the top of my head. :(

Trying the Czech locale (I normally run with the values below), I've
come across some unexpected behavior.

0xE8 is c caron, and sorts between c and d, but not on this computer.
0xBE is z caron, and sorts just after z.
I'd expect [a-z] to match 0xE8 but it does not.

LANG=cs_CZ.ISO8859-2
LC_COLLATE=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_MESSAGES=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
LC_TIME=en_US.ISO8859-1

roman@dagan ~/tmp/blemc 1042:0 > uname -srm       
FreeBSD 6.1-PRERELEASE amd64
roman@dagan ~/tmp/blemc 1043:0 > cat ./collseq.php
#!/usr/bin/env php
<?php

function f($c, $l)
{
    printf("char=%c locale=%s\n", $c, $l);
    setlocale(LC_COLLATE, $l);
    setlocale(LC_CTYPE,   $l);

    printf("[a-z]       = %s\n", var_export(preg_match('~[a-z]~', chr($c)), 1));
    printf("[[:lower:]] = %s\n", var_export(preg_match('~[[:lower:]]~', chr($c)), 1));
    printf("islower(3)  = %s\n", var_export(ctype_lower(chr($c)), 1));
    print "\n";
}

f(0xE8, 'C'); f(0xE8, 'cs_CZ.ISO8859-2');
f(0xBE, 'C'); f(0xBE, 'cs_CZ.ISO8859-2');

roman@dagan ~/tmp/blemc 1044:0 > ./collseq.php
char=č locale=C
[a-z]       = 0
[[:lower:]] = 0
islower(3)  = false

char=č locale=cs_CZ.ISO8859-2
[a-z]       = 0
[[:lower:]] = 1
islower(3)  = true

char=ž locale=C
[a-z]       = 0
[[:lower:]] = 0
islower(3)  = false

char=ž locale=cs_CZ.ISO8859-2
[a-z]       = 0
[[:lower:]] = 1
islower(3)  = true

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux