Mathias Krause <minipli@xxxxxxxxxxxxxx> writes: > ... or fall-back to the previous behaviour and > ignore Unicode properties (and reintroduce the bug commit acabd2048ee0 > ("grep: correctly identify utf-8 characters with \{b,w} in -P") wanted > to fix). > > I went with the second option and could confirm the below patch fixes > the segfault on Ubuntu 20.04 which is shipping the broken version. > > Junio, what's your call on it? Below patch or simply a revert of commit > acabd2048ee0? Other ideas? Thanks for all the investigation and a prompt fix. Very much appreciated. As you identified where the breakage ended in the versions of pcre, I think that the approach the patch I am responding to takes is more sensible than a straight revert. But I somehow suspect that it may be better to have the "workaround" independent of the line acabd204 (grep: correctly identify utf-8 characters with \{b,w} in -P, 2023-01-08) touched, more like how we "Work around ... fixed in 10.36". IOW, not like this > options |= PCRE2_CASELESS; > } > - if (!opt->ignore_locale && is_utf8_locale() && !literal) > + if (!opt->ignore_locale && is_utf8_locale() && !literal) { > options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF); > +#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER > + /* > + * Work around a JIT bug related to invalid Unicode character > + * handling fixed in 10.35: > + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d > + */ > + options ^= PCRE2_UCP; > +#endif > + } but more like if (!opt->ignore_locale && is_utf8_locale() && !literal) options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF); +#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER + /* + * Work around a JIT bug related to invalid Unicode character + * handling fixed in 10.35: + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d + */ + options ^= PCRE2_UCP; +#endif + #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS)) That way, no matter how we ended up setting the UCP bit in the options variable, we would avoid broken UCP handling on problematic versions, no? > #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER > /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */ > diff --git a/grep.h b/grep.h > index 6075f997e68f..c59592e3bdba 100644 > --- a/grep.h > +++ b/grep.h > @@ -7,6 +7,9 @@ > #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11 > #define GIT_PCRE2_VERSION_10_36_OR_HIGHER > #endif > +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 35) || PCRE2_MAJOR >= 11 > +#define GIT_PCRE2_VERSION_10_35_OR_HIGHER > +#endif > #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11 > #define GIT_PCRE2_VERSION_10_34_OR_HIGHER > #endif