Re: Suspected git grep regression in git 2.40.0 - proposed fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I looked a little further and found an interesting entry in the PCRE2's
changelog for version 10.35:

https://github.com/PCRE2Project/pcre2/blob/pcre2-10.35/ChangeLog#L66:

  17. Fix a crash which occurs when the character type of an invalid UTF
  character is decoded in JIT.

Its fix commit https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
and associated unit test is basically doing the same as what Stephane is
running into: making the JIT compiled code choke on something that's not
a valid UTF-8 string.

So it looks like we're out of luck and have to implement yet another
quirk to handle these broken versions. We can either disable the JIT
compiler completely for these versions (and exchange the segfault for a
serve performance regression) or fall-back to the previous behaviour and
ignore Unicode properties (and reintroduce the bug commit acabd2048ee0
("grep: correctly identify utf-8 characters with \{b,w} in -P") wanted
to fix).

I went with the second option and could confirm the below patch fixes
the segfault on Ubuntu 20.04 which is shipping the broken version.

Junio, what's your call on it? Below patch or simply a revert of commit
acabd2048ee0? Other ideas?

Thanks,
Mathias

-- >8 --
Subject: [PATCH] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34

Stephane is reporting[1] a regression introduced in git v2.40.0 that leads
to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an
older version of libpcre2 that triggers a wild pointer dereference in
the generated JIT code that was fixed in PCRE2 10.35.

Instead of completely disabling the JIT compiler for the buggy version,
just mask out the Unicode property handling as we used to do prior to
commit acabd2048ee0 ("grep: correctly identify utf-8 characters with
\{b,w} in -P").

[1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@xxxxxxxxxx/

Reported-by: Stephane Odul <stephane@xxxxxxxxxx>
Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>
---
 grep.c | 11 ++++++++++-
 grep.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index cee44a78d044..d249beae60d0 100644
--- a/grep.c
+++ b/grep.c
@@ -317,8 +317,17 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if (!opt->ignore_locale && is_utf8_locale() && !literal)
+	if (!opt->ignore_locale && is_utf8_locale() && !literal) {
 		options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
+#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
+		/*
+		 * Work around a JIT bug related to invalid Unicode character
+		 * handling fixed in 10.35:
+		 * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
+		 */
+		options ^= PCRE2_UCP;
+#endif
+	}
 
 #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
 	/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
diff --git a/grep.h b/grep.h
index 6075f997e68f..c59592e3bdba 100644
--- a/grep.h
+++ b/grep.h
@@ -7,6 +7,9 @@
 #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
 #endif
+#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 35) || PCRE2_MAJOR >= 11
+#define GIT_PCRE2_VERSION_10_35_OR_HIGHER
+#endif
 #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11
 #define GIT_PCRE2_VERSION_10_34_OR_HIGHER
 #endif
-- 
2.39.2




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux