Am 31.01.22 um 22:01 schrieb Ævar Arnfjörð Bjarmason: > > On Sun, Jan 30 2022, René Scharfe wrote: > >> Am 30.01.22 um 10:04 schrieb SZEDER Gábor: >>> On Sun, Jan 30, 2022 at 08:55:02AM +0100, René Scharfe wrote: >>>> e0c6029 (Fix inifinite loop when a single byte newline is searched in >>>> JIT., 2020-05-29) [1] sounds like it might have fixed it. It's part of >>>> version 10.36. >>> >>> I saw this hang on two Ubuntu 20.04 based boxes, which predate that >>> fix you mention only by a month or two, and apparently the almost two >>> years since then was not enough for this fix to trickle down into >>> updated 20.04 pcre packages, because: >>> >>>> Do you still get the error when you disable JIT, i.e. when you use the >>>> pattern "(*NO_JIT)^\s" instead? >>> >>> No, with this pattern it works as expected. >>> >>> So is there a more convenient way to disable PCRE JIT in Git? FWIW, >>> (non-git) 'grep -P' works with the same patterns. >> >> I don't know a better way. We could do it automatically, though: >> >> --- >8 --- >> Subject: [PATCH] grep: disable JIT on PCRE2 before 10.36 to avoid endless loop >> >> Commit e0c6029 (Fix inifinite loop when a single byte newline is >> searched in JIT., 2020-05-29) of PCRE2 adds the following point to its >> ChangeLog for version 10.36: >> >> 2. Fix inifinite loop when a single byte newline is searched in JIT when >> invalid utf8 mode is enabled. >> >> Avoid that bug on older versions (which are still reportedly found in >> the wild) by disabling the JIT when handling UTF-8. >> >> Reported-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> >> Signed-off-by: René Scharfe <l.s.r@xxxxxx> >> --- >> Not sure how to test it. Killing git grep after a second or so seems a >> bit clumsy. timeout(1) from GNU coreutils at least allows doing that >> from the shell, but it's not a standard tool. Perhaps we need a new >> test helper for that purpose? https://mywiki.wooledge.org/BashFAQ/068 offers a Perl-based Shell function or aborting a program if it takes too long: doalarm() { perl -e 'alarm shift; exec @ARGV' -- "$@"; } It doesn't waste time when the program finishes faster and seems to work fine with git grep. I can't actually test the effectiveness of the patch because PCRE2's JIT doesn't work on my development machine at all (Apple M1), as I just discovered. :-/ While we know that disabling JIT helps, we didn't actually determine, yet, if e0c6029 (Fix inifinite loop when a single byte newline is searched in JIT., 2020-05-29) really fixes the "^\s" bug. So I have to abandon this patch, unfortunately. Any volunteer to pick it up? >> >> grep.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/grep.c b/grep.c >> index 7bb0360869..16629a2301 100644 >> --- a/grep.c >> +++ b/grep.c >> @@ -406,6 +406,14 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt >> } >> >> pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); >> +#ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER >> + /* >> + * Work around the bug fixed by e0c6029 (Fix inifinite loop when a > > Better to quote this as PhilipHazel/pcre2@e0c6029 or something, i.e. to > indicate that it's not git.git's commit. The context should suffice for a human reader to understand that it's a PCRE2 about a bug fix, but I can see some program turning hex strings into repo-local links getting confused. Is there a standard cross-repo citation format? Using a Github link might be the most practical way for the near term, I suspect. > >> + * single byte newline is searched in JIT., 2020-05-29). >> + */ >> + if (options & PCRE2_MATCH_INVALID_UTF) >> + p->pcre2_jit_on = 0; > > It seems rather heavy-hande, but I can't think of a better way to deal > with this, i.e. if we selectively use JIT on older versions, surely we > run into the match-bytes-but-want-chars bug you were fixing. > >> +#endif >> if (p->pcre2_jit_on) { >> jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); >> if (jitret) >