# neuhauser@xxxxxxxxxx / 2007-01-21 10:48:30 +0000: > # jochem@xxxxxxxxxxxxx / 2007-01-21 00:11:13 +0100: > > Roman Neuhauser wrote: > > > # jochem@xxxxxxxxxxxxx / 2007-01-17 16:59:26 +0100: > > >> wouldn't it be fair to assume (safety through paranoia) that > > >> ctype_alnum() would suffer the same problem? (given the manual's > > >> indication that ctype_alnum() and the offending regexp are equivalent?) > > > > > > isalnum(3) uses isalpha(3) and isdigit(3), so yes, their results are > > > locale-dependent (LC_CTYPE, see setlocale(3)), but don't depend on > > > collating sequence. > > > > so really the doc's are slightly misleading or even incorrect, > > Slightly, in a usually-behaves-as-described-but-for-different-reasons > way. > > > as a side note: do you have any real world example of where this > > collation issue might actually bite someone making use of the aforementioned > > regexp range? > > Not off the top of my head. :( Trying the Czech locale (I normally run with the values below), I've come across some unexpected behavior. 0xE8 is c caron, and sorts between c and d, but not on this computer. 0xBE is z caron, and sorts just after z. I'd expect [a-z] to match 0xE8 but it does not. LANG=cs_CZ.ISO8859-2 LC_COLLATE=en_US.ISO8859-1 LC_CTYPE=en_US.ISO8859-1 LC_MESSAGES=en_US.ISO8859-1 LC_NUMERIC=en_US.ISO8859-1 LC_TIME=en_US.ISO8859-1 roman@dagan ~/tmp/blemc 1042:0 > uname -srm FreeBSD 6.1-PRERELEASE amd64 roman@dagan ~/tmp/blemc 1043:0 > cat ./collseq.php #!/usr/bin/env php <?php function f($c, $l) { printf("char=%c locale=%s\n", $c, $l); setlocale(LC_COLLATE, $l); setlocale(LC_CTYPE, $l); printf("[a-z] = %s\n", var_export(preg_match('~[a-z]~', chr($c)), 1)); printf("[[:lower:]] = %s\n", var_export(preg_match('~[[:lower:]]~', chr($c)), 1)); printf("islower(3) = %s\n", var_export(ctype_lower(chr($c)), 1)); print "\n"; } f(0xE8, 'C'); f(0xE8, 'cs_CZ.ISO8859-2'); f(0xBE, 'C'); f(0xBE, 'cs_CZ.ISO8859-2'); roman@dagan ~/tmp/blemc 1044:0 > ./collseq.php char=č locale=C [a-z] = 0 [[:lower:]] = 0 islower(3) = false char=č locale=cs_CZ.ISO8859-2 [a-z] = 0 [[:lower:]] = 1 islower(3) = true char=ž locale=C [a-z] = 0 [[:lower:]] = 0 islower(3) = false char=ž locale=cs_CZ.ISO8859-2 [a-z] = 0 [[:lower:]] = 1 islower(3) = true -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php