Please do not reply directly to this email. All additional comments should be made in the comments box of this bug report. Summary: glibc or perl incorrect locale LC_CTYPE data https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166478 ------- Additional Comments From jvdias@xxxxxxxxxx 2005-11-01 20:29 EST ------- One has to carefully analyse the perl man-pages to find out what is going on here. I don't particularly agree with the way the upstream perl maintainers have done this, but this is not a bug - it is the way perl is meant to behave. The point is that /\w/ matches any ASCII word char, and /\W/ matches any ASCII non-word char. To match a UTF-8 word character, you have to use \p{IsWord} . The \w wildcard is a synonym for the POSIX character class [:word:]. So this version of your program : --- #!/usr/bin/perl -w -C use strict; use utf8; use locale; use Encode qw(decode); my $str = decode('utf-8', "\xc3\x81\xc4\x8c"); # U+00C1 "A with acute", U+010C "C with caron" (encoded in UTF-8) print 'Is UTF-8:',utf8::is_utf8($str), ' is word:', $str =~ /^\w+$/,' is UTF-8 word:', $str =~ /^\p{IsWord}+$/, ' str:',$str, "\n"; -- Configure bugmail: https://bugzilla.redhat.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.