Re: bug#60690: -P '\d' in GNU and git grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 5, 2023 at 12:40 PM Jim Meyering <jim@xxxxxxxxxxxx> wrote:
>
> Changing grep -P's \d to match multibyte digits by default would break
> an important contract.

While I tend to agree[1] (and indeed that is why PCRE2_EXTRA_ASCII_BSD
was invented), it would be also important to note that it goes against
the Unicode recommendation[2] and it is actually not true already[3]
for Python, .NET or Rust (which means ripgrep behaves like GNU grep -P
3.9).

FWIW I also agree that (at least `git grep -P`) should use
PCRE2_EXTRA_ASCII_BSD by default as that is what makes more sense in
the context of matching source code and using instead `\P{Nd}` if you
really want all Unicode digits is not much of a burden, but I am also
not sure if that makes sense in other contexts, specially considering
that I am obviously biased since the languages I mostly interact with
ONLY use arabic numerals and therefore `\d` meaning `[0-9]` seems
"normal".

Carlo

CC: changed to the real email address for PCRE2 development, for full
context on this thread use [4]

[1] https://github.com/PCRE2Project/pcre2/pull/186
[2] https://unicode.org/reports/tr18/
[3] https://regex101.com/r/S5RW4c/1
[4] https://lore.kernel.org/git/230109.86v8lf297g.gmgdl@xxxxxxxxxxxxxxxxxxx/T/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux