I looked a little further and found an interesting entry in the PCRE2's changelog for version 10.35: https://github.com/PCRE2Project/pcre2/blob/pcre2-10.35/ChangeLog#L66: 17. Fix a crash which occurs when the character type of an invalid UTF character is decoded in JIT. Its fix commit https://github.com/PCRE2Project/pcre2/commit/c21bd977547d and associated unit test is basically doing the same as what Stephane is running into: making the JIT compiled code choke on something that's not a valid UTF-8 string. So it looks like we're out of luck and have to implement yet another quirk to handle these broken versions. We can either disable the JIT compiler completely for these versions (and exchange the segfault for a serve performance regression) or fall-back to the previous behaviour and ignore Unicode properties (and reintroduce the bug commit acabd2048ee0 ("grep: correctly identify utf-8 characters with \{b,w} in -P") wanted to fix). I went with the second option and could confirm the below patch fixes the segfault on Ubuntu 20.04 which is shipping the broken version. Junio, what's your call on it? Below patch or simply a revert of commit acabd2048ee0? Other ideas? Thanks, Mathias -- >8 -- Subject: [PATCH] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34 Stephane is reporting[1] a regression introduced in git v2.40.0 that leads to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an older version of libpcre2 that triggers a wild pointer dereference in the generated JIT code that was fixed in PCRE2 10.35. Instead of completely disabling the JIT compiler for the buggy version, just mask out the Unicode property handling as we used to do prior to commit acabd2048ee0 ("grep: correctly identify utf-8 characters with \{b,w} in -P"). [1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@xxxxxxxxxx/ Reported-by: Stephane Odul <stephane@xxxxxxxxxx> Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx> --- grep.c | 11 ++++++++++- grep.h | 3 +++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index cee44a78d044..d249beae60d0 100644 --- a/grep.c +++ b/grep.c @@ -317,8 +317,17 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt } options |= PCRE2_CASELESS; } - if (!opt->ignore_locale && is_utf8_locale() && !literal) + if (!opt->ignore_locale && is_utf8_locale() && !literal) { options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF); +#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER + /* + * Work around a JIT bug related to invalid Unicode character + * handling fixed in 10.35: + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d + */ + options ^= PCRE2_UCP; +#endif + } #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ diff --git a/grep.h b/grep.h index 6075f997e68f..c59592e3bdba 100644 --- a/grep.h +++ b/grep.h @@ -7,6 +7,9 @@ #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER #endif +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 35) || PCRE2_MAJOR >= 11 +#define GIT_PCRE2_VERSION_10_35_OR_HIGHER +#endif #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 #define GIT_PCRE2_VERSION_10_34_OR_HIGHER #endif -- 2.39.2