[Bug 166478] glibc or perl incorrect locale LC_CTYPE data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug report.

Summary: glibc or perl incorrect locale LC_CTYPE data


https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166478





------- Additional Comments From jvdias@xxxxxxxxxx  2005-11-01 20:29 EST -------
One has to carefully analyse the perl man-pages to find out what is going on here.

I don't particularly agree with the way the upstream perl maintainers have done
this, but this is not a bug - it is the way perl is meant to behave.

The point is that /\w/ matches any ASCII word char, and /\W/ matches any ASCII
non-word char.

To match a UTF-8 word character, you have to use \p{IsWord} .

The \w wildcard is a synonym for the POSIX character class [:word:]. 

So this version of your program :
---
#!/usr/bin/perl -w -C

use strict;
use utf8;
use locale;
use Encode qw(decode);

my $str = decode('utf-8', "\xc3\x81\xc4\x8c");
        # U+00C1 "A with acute", U+010C "C with caron" (encoded in UTF-8)

print 'Is UTF-8:',utf8::is_utf8($str), 
      ' is word:', $str =~ /^\w+$/,' 
is UTF-8 word:', 
      $str =~ /^\p{IsWord}+$/, ' str:',$str, "\n";



-- 
Configure bugmail: https://bugzilla.redhat.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Legacy Announce]     [Fedora PHP Devel]     [Kernel Devel]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Big List of Linux Books]     [Gimp]     [Yosemite Information]
  Powered by Linux