On 21.03.23 17:33, Junio C Hamano wrote: > [jc: on the CC: line, summoned a few people who may know a lot more > about pcre than I do] > > Stephane Odul <stephane@xxxxxxxxxx> writes: > >> We have a CI pipeline on a private repository that started failing >> consistently while running `git grep -P` commands. The command >> fails with an exit code of -11 and is failing pretty >> consistently. With prior versions of git there is no issue >> whatsoever, but with 2.40.0 it always fails on the same >> call. Other git grep calls are fine, but that one is failing >> consistently. >> >> I was not able to reproduce locally but my main machine is an M1 >> MacBook Pro, the CI pipeline runs under Kubernetes in AWS and the >> container is based on Ubuntu 20.04 with the git client installed >> via the PPA. >> >> The error is for this pattern `git grep -cP '^\w+ = json.load'`. >> >> As a workaround we tried to download and install the microsoft-git >> v2.39.2 deb package since it allows us to downgrade, but then the >> git grep commands just got stuck. > > One "grep -P" related change we had recently between 2.39 and 2.40 > was > > 50b6ad55 (grep: fall back to interpreter if JIT memory > allocation fails, 2023-01-31) > > The code tries to figure out at runtime if pcre engine has > functioning JIT by making an extra JIT compilation of a sample > pattern and when it fails with a specific reason, fall back to > interpreted pattern matching. The change made the code to emit a > bit more detailed information when it fails, but a controlled exit > from the codepath should give $?=128, not 11. > > So the above commit may or may not be related. It could be that the > version of pcre library linked to Git 2.40 and older Git you are > running in your CI environment has been updated. I don't think it's that commit. I just tried the pattern here on a system that would require the interpreter fallback and one that wouldn't and both didn't error out (Debian bookworm/sid; libpcre2-8: 10.42-1). Looking at the history, though, another commit stands out: $ git log --oneline --no-merges v2.39.0..v2.40.0 -- grep.c fb2ebe72a374 grep API: plug memory leaks by freeing "header_list" 891c9965fbc0 grep.c: refactor free_grep_patterns() 50b6ad55b04d grep: fall back to interpreter if JIT memory allocation fails acabd2048ee0 grep: correctly identify utf-8 characters with \{b,w} in -P Commit acabd2048ee0 suspiciously matches the pattern with its "\w" mention. Looking at the diff gives: if (!opt->ignore_locale && is_utf8_locale() && !literal) - options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF); + options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF); So maybe the PCRE2_UCP option isn't supported by the system in question or it forces PCRE2 to do a verification it didn't do before? The exit code of -11 could relate to PCRE2's error PCRE2_ERROR_UTF8_ERR9 (-11) strengthening the suspicion it's related to that change. It's described as: PCRE2_ERROR_UTF8_ERR9 5th-byte's two top bits are not 0x80 Maybe the input data isn't valid UTF-8 encoded? Stephane, can you pin point the file that the grep is failing on and verify it is indeed a valid UFT-8 encoded file via: $ iconv -f UTF-8 your_file > /dev/null && echo OK || echo "Not valid" If this prints "OK", we need to investigate further. However, if it prints "Not valid", a broken file encoding is probably the cause and we should revisit the above change and think of alternatives, as this might be seen as a regression (even though the file isn't strictly valid UTF-8). Thanks, Mathias > > Does it make a difference if you disable JIT by prefixing the > pattern with (*NO_JIT)? > > Thanks.