On Wed, Apr 5, 2023 at 12:40 PM Jim Meyering <jim@xxxxxxxxxxxx> wrote: > > Changing grep -P's \d to match multibyte digits by default would break > an important contract. While I tend to agree[1] (and indeed that is why PCRE2_EXTRA_ASCII_BSD was invented), it would be also important to note that it goes against the Unicode recommendation[2] and it is actually not true already[3] for Python, .NET or Rust (which means ripgrep behaves like GNU grep -P 3.9). FWIW I also agree that (at least `git grep -P`) should use PCRE2_EXTRA_ASCII_BSD by default as that is what makes more sense in the context of matching source code and using instead `\P{Nd}` if you really want all Unicode digits is not much of a burden, but I am also not sure if that makes sense in other contexts, specially considering that I am obviously biased since the languages I mostly interact with ONLY use arabic numerals and therefore `\d` meaning `[0-9]` seems "normal". Carlo CC: changed to the real email address for PCRE2 development, for full context on this thread use [4] [1] https://github.com/PCRE2Project/pcre2/pull/186 [2] https://unicode.org/reports/tr18/ [3] https://regex101.com/r/S5RW4c/1 [4] https://lore.kernel.org/git/230109.86v8lf297g.gmgdl@xxxxxxxxxxxxxxxxxxx/T/