On 23.03.23 17:19, Junio C Hamano wrote: > Mathias Krause <minipli@xxxxxxxxxxxxxx> writes: > >> ... or fall-back to the previous behaviour and >> ignore Unicode properties (and reintroduce the bug commit acabd2048ee0 >> ("grep: correctly identify utf-8 characters with \{b,w} in -P") wanted >> to fix). >> >> I went with the second option and could confirm the below patch fixes >> the segfault on Ubuntu 20.04 which is shipping the broken version. >> >> Junio, what's your call on it? Below patch or simply a revert of commit >> acabd2048ee0? Other ideas? > > Thanks for all the investigation and a prompt fix. Very much > appreciated. As there was the slight chance it could be caused by my grep.c change, I had the ambition to prove it either way. :D > As you identified where the breakage ended in the versions of pcre, > I think that the approach the patch I am responding to takes is more > sensible than a straight revert. Ok. It's really unfortunate that PCRE2 has so many bugs we need to work around :/ > But I somehow suspect that it may be better to have the "workaround" > independent of the line acabd204 (grep: correctly identify utf-8 > characters with \{b,w} in -P, 2023-01-08) touched, more like how we > "Work around ... fixed in 10.36". > > IOW, not like this > >> options |= PCRE2_CASELESS; >> } >> - if (!opt->ignore_locale && is_utf8_locale() && !literal) >> + if (!opt->ignore_locale && is_utf8_locale() && !literal) { >> options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF); >> +#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER >> + /* >> + * Work around a JIT bug related to invalid Unicode character >> + * handling fixed in 10.35: >> + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d >> + */ >> + options ^= PCRE2_UCP; >> +#endif >> + } > > but more like > > if (!opt->ignore_locale && is_utf8_locale() && !literal) > options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF); > > +#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER > + /* > + * Work around a JIT bug related to invalid Unicode character > + * handling fixed in 10.35: > + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d > + */ > + options ^= PCRE2_UCP; > +#endif > + > #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER > /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ > if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) > > That way, no matter how we ended up setting the UCP bit in the > options variable, we would avoid broken UCP handling on problematic > versions, no? Ah, sure! It's no issue with the current code base, as PCRE2_UCP only gets set in that branch, but there's no need to narrow the workaround to it as well. I'll send a v2 in a few! Thanks, Mathias > >> #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER >> /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ >> diff --git a/grep.h b/grep.h >> index 6075f997e68f..c59592e3bdba 100644 >> --- a/grep.h >> +++ b/grep.h >> @@ -7,6 +7,9 @@ >> #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 >> #define GIT_PCRE2_VERSION_10_36_OR_HIGHER >> #endif >> +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 35) || PCRE2_MAJOR >= 11 >> +#define GIT_PCRE2_VERSION_10_35_OR_HIGHER >> +#endif >> #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 >> #define GIT_PCRE2_VERSION_10_34_OR_HIGHER >> #endif