grep, ispunct() and locales/charsets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm having a bit of trouble to make regular expressions in grep using
isalpha() and related functions work. (Seen man 1 grep, man 7 regex and
man 3 isalpha).

The LC_CTYPE section of /usr/share/i18n/locales/en_US and mk_MK
copy /usr/share/i18n/locales/i18n, in which <U2026> (HORIZONTAL
ELLIPSIS) is defined as a punctuation character.

Using a gnome-terminal for which I can set the character encoding, so
the below examples are converted back to UTF-8 in this mail (I switch
encoding after each "export LANG", which doesn't matter for the matching
by the way).

"\xe2\x80\xa6" is the UTF-8 equivalent of the horizontal ellipsis,
"\x85" is the CP1250/1/2 equivalent.

$ export LANG=en_US.UTF-8
$ echo -e "\xe2\x80\xa6"
…
$ echo -e "\xe2\x80\xa6" | grep "[[:punct:]]"
…
$ export LANG=en_US.CP1252
$ echo -e "\x85"
…
$ echo -e "\x85" | grep "[[:punct:]]"
$ export LANG=en_US.CP1251
$ echo -e "\x85"
…
$ echo -e "\x85" | grep "[[:punct:]]"
$ export LANG=en_US.CP1250
$ echo -e "\x85"
…
$ echo -e "\x85" | grep "[[:punct:]]"
$ export LANG=mk_MK.CP1251
$ echo -e "\x85"
…
$ echo -e "\x85" | grep "[[:punct:]]"
$

Note that CP1251 is explicitly mentioned in mk_MK.

Now why doesn't the horizontal ellipsis get matched as a punctuation
char? Is this a bug? Is so, where? If not, how do I accomplish the
match?

Leonard.

-- 
mount -t life -o ro /dev/dna /genetic/research


-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora Magazine]     [Fedora News]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [SSH]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux