Am 06.01.23 um 10:09 schrieb Jeff King: > On Wed, Jan 04, 2023 at 05:36:21PM +0100, René Scharfe wrote: > >>> I didn't test, but just from looking at the patch I'd expect this to >>> affect other parts of Git besides git-grep. E.g., "git log --grep". >>> Which raises two questions: >>> >>> - would a more generalized name be better? USE_REG_ENHANCED or >>> something? That might be _too_ general, but see below. >>> >>> - should this cover other cases? Grepping for "regcomp", would people >>> want this to behave consistently for "git config --get-regexp", or >>> diff funcnames, and so on? >>> >>> If so, then I could envision a USE_REG_ENHANCED which just wraps the >>> system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not >>> set? >> >> Good point. I don't know what people want, though. re_format(7) on >> macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended >> REs "modern", so they seem to push people away from the old kind, >> enhanced or not. > > Oh, good point. I was just grepping for regcomp(), but of course any > case which is already passing REG_EXTENDED would not be affected anyway. > And most places are already using that. E.g., the config code always > does so, and it looks like pickaxe "-G" does so. > > For diffs, we have diff.*.xfuncname, which uses EREs. We do still > support regular "funcname" for backwards compatibility, but we only > document the extended version. Ironically, that option was introduced > because BREs did not portably support things like alternation, even with > the "enhanced" syntax. ;) See 45d9414fa5 (diff.*.xfuncname which uses > "extended" regex's for hunk header selection, 2008-09-18). > > So I think we are embracing the "everyone should use EREs" mentality > already. The only spots I see that use BREs are: > > - grep.c, which handles "git grep" and "git log --grep" > > - line-range.c, presumably for "-L" function matching > > - deprecated non-ERE funcname patterns > > Your patch is handling the first, which is by the far most important. I > would be OK leaving the others as-is, but I also wouldn't mind a patch > that works at the regcomp() level to make things automatically > consistent. There's also the code handling "git log -i -S nonäscii" in diffcore_pickaxe(), but it quotes special characters and thus effectively does a fixed-string search, so you're right in omitting it above. REG_ENHANCED on macOS affects REG_EXTENDED as well. It unlocks e.g. non-greedy repetitions and inline comments. Sounds nice, but also potentially surprising. I was unable to find a current version of the re_format(7) manpage online, unfortunately. Apple's latest version of Git sets NO_REGEX and thus uses compat/regex, if I read their source correctly: https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302 The easiest and most consistent option would be to do the same. But we can't do that, because it would break multibyte support, which was fixed by 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26), which started to use the system regex functions again. Which begs the question: Isn't that a problem for the platforms that still have to use NO_REGEX? Shouldn't we fix compat/regex? Anyway, here's an attempt at a more general, but still targeted fix for macOS: --- >8 --- Subject: [PATCH] use enhanced basic regular expressions on macOS When 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26) started to use the native regex library instead of Git's own (compat/regex/), it lost support for alternation in basic regular expressions. Bring it back by enabling the flag REG_ENHANCED on macOS when compiling basic regular expressions. Reported-by: Marco Nenciarini <marco.nenciarini@xxxxxxxxxxxxxxxx> Suggested-by: Jeff King <peff@xxxxxxxx> Signed-off-by: René Scharfe <l.s.r@xxxxxx> --- Makefile | 9 +++++++++ compat/regcomp_enhanced.c | 9 +++++++++ config.mak.uname | 1 + git-compat-util.h | 5 +++++ 4 files changed, 24 insertions(+) create mode 100644 compat/regcomp_enhanced.c diff --git a/Makefile b/Makefile index db447d0738..46e30be673 100644 --- a/Makefile +++ b/Makefile @@ -289,6 +289,10 @@ include shared.mak # Define NO_REGEX if your C library lacks regex support with REG_STARTEND # feature. # +# Define USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS if your C library provides +# the flag REG_ENHANCED and you'd like to use it to enable enhanced basic +# regular expressions. +# # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # @@ -2037,6 +2041,11 @@ endif ifdef NO_REGEX COMPAT_CFLAGS += -Icompat/regex COMPAT_OBJS += compat/regex/regex.o +else +ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS + COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS + COMPAT_OBJS += compat/regcomp_enhanced.o +endif endif ifdef NATIVE_CRLF BASIC_CFLAGS += -DNATIVE_CRLF diff --git a/compat/regcomp_enhanced.c b/compat/regcomp_enhanced.c new file mode 100644 index 0000000000..84193ce53b --- /dev/null +++ b/compat/regcomp_enhanced.c @@ -0,0 +1,9 @@ +#include "../git-compat-util.h" +#undef regcomp + +int git_regcomp(regex_t *preg, const char *pattern, int cflags) +{ + if (!(cflags & REG_EXTENDED)) + cflags |= REG_ENHANCED; + return regcomp(preg, pattern, cflags); +} diff --git a/config.mak.uname b/config.mak.uname index d63629fe80..7d25995265 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin) FREAD_READS_DIRECTORIES = UnfortunatelyYes HAVE_NS_GET_EXECUTABLE_PATH = YesPlease CSPRNG_METHOD = arc4random + USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS = YesPlease # Workaround for `gettext` being keg-only and not even being linked via # `brew link --force gettext`, should be obsolete as of diff --git a/git-compat-util.h b/git-compat-util.h index 76e4b11131..1efa834089 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1338,6 +1338,11 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size, return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND); } +#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS +int git_regcomp(regex_t *preg, const char *pattern, int cflags); +#define regcomp git_regcomp +#endif + #ifndef DIR_HAS_BSD_GROUP_SEMANTICS # define FORCE_DIR_SET_GID S_ISGID #else -- 2.39.0