Re: [PATCH v2] grep: correctly identify utf-8 characters with \{b,w} in -P

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 9, 2023 at 4:17 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>Rather than trying to opt-out with "/a" or "/aa" I think this should be opt-in.
>
> As the example at the start shows you can already do this with "(*UCP)"
> in the pattern, so perhaps we should just link to the pcre2pattern(3)
> manual from git-grep(1)?

Considering that PCRE is used internally even for cases that don't
specify -P how would that opt-in work?

For example, in a repository with code that uses utf identifiers, the
following will fail:

  $ git grep -w -E motion
  u.c:  int émotion = 0;
  $ git grep -w -E '(*UCP)motion'
  fatal: command line, '(*UCP)motion': Invalid preceding regular expression
  $ git -P grep -P -w '(*UCP)motion'
  u.c:  int émotion = 0;

Carlo

CC removed gnu and the obsoleted PCRE developer list (if really needed
would be better to use the documented pcre2-dev@xxxxxxxxxxxxxxxx,
instead)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux