Am 03.01.23 um 19:13 schrieb Marco Nenciarini: > On 03/01/23 17:29, René Scharfe wrote: >> Am 03.01.23 um 10:53 schrieb Marco Nenciarini: >>> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives. >>> >>> ``` >>> $ echo abd > test.file >>> $ git grep --untracked 'a\(b\|c\)d' test.file >>> $ git grep --untracked 'a\(b\)d' test.file >>> test.file:abd >>> ``` >>> >>> It should have matched in both cases. >>> >>> >>> If I switch to exented pattern it starts working again >>> >>> ``` >>> $ git grep --untracked -E 'a(b|c)d' test.file >>> test.file:abd >>> ``` >> >> This is expected: Basic regular expressions don't support alternation. >> >> Under which circumstances did it work for you without -E or -P? >> > > It has always worked until I installed 2.39.0 on my mac. I've also checked with other developers that are using a mac and they reports the same. On Linux it works regardless the git version. > > Using grep directly also works, so it is a different behavior between grep and git grep, that is surprising. Meaning you used Apple's version of git before? $ echo abd > test.file $ /usr/bin/git grep --untracked 'b\|c' test.file test.file:abd $ /usr/bin/git version git version 2.37.1 (Apple Git-137.1) $ git grep --untracked 'b\|c' test.file $ git version git version 2.39.0 So I can reproduce your findings on macOS Ventura. Same with grep: $ grep 'b\|c' test.file abd $ grep --version grep (BSD grep, GNU compatible) 2.6.0-FreeBSD re_format(7) says: "Obsolete (“basic”) regular expressions differ in several respects. ‘|’ is an ordinary character and there is no equivalent for its functionality.". Under the headline "ENHANCED FEATURES" it continues, however: "When the REG_ENHANCED flag is passed to one of the regcomp() variants, additional features are activated." And: "For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters, but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the unescaped characters do for extended REs, i.e., one or more matches, zero or one matches and alteration, respectively." So apparently Apple chose a middle ground between basic and extended regular expressions for its grep and git grep. Under "IMPLEMENTATION CHOICES" it says: "The regex implementation in Mac OS X 10.8 and later is based on a heavily modified subset of TRE (http://laurikari.net/tre/)." The manpage of GNU grep says: "grep understands three different versions of regular expression syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE). In GNU grep there is no difference in available functionality between basic and extended syntax. In other implementations, basic regular expressions are less powerful." And under the headline "Basic vs Extended Regular Expressions": "In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \)." So GNU grep apparently made the same choice as Apple, probably far earlier. When I compile git with NO_REGEX and thus with the fallback in compat/regex/, the result supports alternation as well: $ ./git grep --untracked 'b\|c' test.file test.file:abd $ nm ./git | grep regcomp 0000000100255978 T _git_regcomp Based on that I suggest: --- >8 --- Subject: grep: use REG_ENHANCED on macOS GNU grep(1) and regcomp(3) use enhanced basic regular expressions by default, which means that it e.g. supports alternation, e.g. "a\|b" matches both "a" and "b". The same is true for our compat/regex/ implementation. On macOS Ventura (and possibly earlier) grep(1) also uses enhanced BREs, but regcomp(3) requires the flag REG_ENHANCED to turn them on. Do that for git grep if possible, for consistency with the platform's grep(1) and our fallback library. It would be simpler to use REG_ENHANCED everywhere it is defined instead of adding a Makefile setting, but it's a non-standard extension and might mean something else on other platforms. Reported-by: Marco Nenciarini <marco.nenciarini@xxxxxxxxxxxxxxxx> Signed-off-by: René Scharfe <l.s.r@xxxxxx> --- Makefile | 8 ++++++++ config.mak.uname | 1 + grep.c | 4 ++++ 3 files changed, 13 insertions(+) diff --git a/Makefile b/Makefile index db447d0738..15e7edc9d2 100644 --- a/Makefile +++ b/Makefile @@ -289,6 +289,10 @@ include shared.mak # Define NO_REGEX if your C library lacks regex support with REG_STARTEND # feature. # +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag +# REG_ENHANCED to enable enhanced basic regular expressions and you'd +# like to use it in git grep. +# # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # @@ -2037,6 +2041,10 @@ endif ifdef NO_REGEX COMPAT_CFLAGS += -Icompat/regex COMPAT_OBJS += compat/regex/regex.o +else +ifdef GIT_GREP_USES_REG_ENHANCED + COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED +endif endif ifdef NATIVE_CRLF BASIC_CFLAGS += -DNATIVE_CRLF diff --git a/config.mak.uname b/config.mak.uname index d63629fe80..14c145ae42 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin) FREAD_READS_DIRECTORIES = UnfortunatelyYes HAVE_NS_GET_EXECUTABLE_PATH = YesPlease CSPRNG_METHOD = arc4random + GIT_GREP_USES_REG_ENHANCED = YesPlease # Workaround for `gettext` being keg-only and not even being linked via # `brew link --force gettext`, should be obsolete as of diff --git a/grep.c b/grep.c index 06eed69493..a8430daaba 100644 --- a/grep.c +++ b/grep.c @@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) regflags |= REG_ICASE; if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE) regflags |= REG_EXTENDED; +#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED) + else + regflags |= REG_ENHANCED; +#endif err = regcomp(&p->regexp, p->pattern, regflags); if (err) { char errbuf[1024]; -- 2.39.0