Hi Ævar, On Thu, 27 Jun 2019, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jun 27 2019, Johannes Schindelin wrote: > > > On Wed, 26 Jun 2019, Johannes Schindelin wrote: > > > >> On Wed, 26 Jun 2019, Ævar Arnfjörð Bjarmason wrote: > >> > >> > This speeds things up a lot, but as shown in the patches & tests > >> > changed modifies the behavior where we have \0 in *patterns* (only > >> > possible with 'grep -f <file>'). > >> > >> I agree that it is not worth a lot to care about NULs in search patterns. > >> > >> So I am in favor of the goal of this patch series. > > > > There seems to be a Windows-specific test failure: > > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11535&view=ms.vss-test-web.build-test-results-tab&runId=28232&resultId=101315&paneView=debug > > > > The output is this: > > > > -- snip -- > > not ok 5 - log --grep does not find non-reencoded values (latin1) > > > > expecting success: > > git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e >actual > > && > > test_must_be_empty actual > > > > ++ git log --encoding=ISO-8859-1 --format=%s --grep=é > > fatal: pcre2_match failed with error code -8: UTF-8 error: byte 2 top bits > > not 0x80 > > -- snap -- > > > > Any quick ideas? (I _could_ imagine that it is yet another case of passing > > non-UTF-8-encoded stuff via command-line vs via file, which does not work > > on Windows.) > > This is an existing issue that my patches just happen to uncover. I'm > working on a v2 which'll fix it. I just found yet another problem. When using a libpcre2 _without_ JIT support, I get this: -- snip -- $ sh t4210-log-i18n.sh -i -V -x || tail -20 test-results/t4210-log-i18n.out ok 1 - create commits in different encodings ok 2 - log --grep searches in log output encoding (utf8) ok 3 # skip log --grep searches in log output encoding (latin1) (missing !MINGW) ok 4 # skip log --grep does not find non-reencoded values (utf8) (missing !MINGW) not ok 5 - log --grep does not find non-reencoded values (latin1) # # git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e # >actual && # test_must_be_empty actual # ok 3 # skip log --grep searches in log output encoding (latin1) (missing !MINGW) skipping test: log --grep does not find non-reencoded values (utf8) git log --encoding=utf8 --format=%s --grep=$latin1_e >actual && test_must_be_empty actual ok 4 # skip log --grep does not find non-reencoded values (utf8) (missing !MINGW) expecting success: git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e >actual && test_must_be_empty actual ++ git log --encoding=ISO-8859-1 --format=%s --grep=é fatal: pcre2_match failed with error code -8: UTF-8 error: byte 2 top bits not 0x80 error: last command exited with $?=128 not ok 5 - log --grep does not find non-reencoded values (latin1) # # git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e # >actual && # test_must_be_empty actual # -- snap -- This is actually a correct error, as we specifically feed non-UTF-8 text to PCRE2, but we do turn on the PCRE2_UTF flag. Funnily enough, this error only occurs when `pcre2_jit_on == 0`, i.e. when we hit the code path that calls `pcre2_match()`. When the alternative code path is used, `pcre2_jit_match()` is called, and it does _not_ print that error. Whatever the bug in libpcre2 that causes the JIT code path to fail on the Unicode validation, it points to the problem in this code in `compile_pcre2_pattern()`: -- snip -- if (is_utf8_locale() && has_non_ascii(p->pattern)) options |= PCRE2_UTF; -- snap -- It only asks whether there is a non-ASCII character in pattern, but we never bother to see whether the haystack is also encoded in UTF-8. In this case, it is not... Ciao, Dscho P.S.: Yes, yes, I know, we should run PCRE2 with JIT, and I am trying to figure out why it is not enabled on Windows when I specifically asked `./configure` to enable it... Investigating now. -