This v3 has a new patch (3/10) that I believe fixes the regression on MinGW Johannes noted in https://public-inbox.org/git/nycvar.QRO.7.76.6.1907011515150.44@xxxxxxxxxxxxxxxxx/ As noted in the updated commit message in 10/10 I believe just skipping this test & documenting this in a commit message is the least amount of suck for now. It's really an existing issue with us doing nothing sensible when the log/grep haystack encoding doesn't match the needle encoding supplied via the command line. We swept that under the carpet with the kwset backend, but PCRE v2 exposes it. Ævar Arnfjörð Bjarmason (10): log tests: test regex backends in "--encode=<enc>" tests grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" t4210: skip more command-line encoding tests on MinGW grep: inline the return value of a function call used only once grep tests: move "grep binary" alongside the rest grep tests: move binary pattern tests into their own file grep: make the behavior for NUL-byte in patterns sane grep: drop support for \0 in --fixed-strings <pattern> grep: remove the kwset optimization grep: use PCRE v2 for optimized fixed-string search Documentation/git-grep.txt | 17 +++ grep.c | 115 +++++++--------- grep.h | 3 +- revision.c | 3 + t/t4210-log-i18n.sh | 41 +++++- ...a1.sh => t7008-filter-branch-null-sha1.sh} | 0 ...08-grep-binary.sh => t7815-grep-binary.sh} | 101 -------------- t/t7816-grep-binary-pattern.sh | 127 ++++++++++++++++++ 8 files changed, 234 insertions(+), 173 deletions(-) rename t/{t7009-filter-branch-null-sha1.sh => t7008-filter-branch-null-sha1.sh} (100%) rename t/{t7008-grep-binary.sh => t7815-grep-binary.sh} (55%) create mode 100755 t/t7816-grep-binary-pattern.sh Range-diff: 1: cfc01f49d3 = 1: cfc01f49d3 log tests: test regex backends in "--encode=<enc>" tests 2: 4b59eb32f0 = 2: 4b59eb32f0 grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" -: ---------- > 3: 676c76afe4 t4210: skip more command-line encoding tests on MinGW 3: cc4d3b50d5 = 4: da9b491f70 grep: inline the return value of a function call used only once 4: d9b29bdd89 = 5: c42d3268fa grep tests: move "grep binary" alongside the rest 5: f85614f435 = 6: 36b9c1c541 grep tests: move binary pattern tests into their own file 6: 90afca8707 = 7: 3c54e782e6 grep: make the behavior for NUL-byte in patterns sane 7: 526b925fdc = 8: 8e5f418189 grep: drop support for \0 in --fixed-strings <pattern> 8: 14269bb295 = 9: d1cb8319d5 grep: remove the kwset optimization 9: c0fd75d102 ! 10: 4de0c82314 grep: use PCRE v2 for optimized fixed-string search @@ -15,6 +15,15 @@ makes the behavior harder to understand and document, and makes tests for the different backends more painful. + This does change the behavior under non-C locales when "log"'s + "--encoding" option is used and the heystack/needle in the + content/command-line doesn't have a matching encoding. See the recent + change in "t4210: skip more command-line encoding tests on MinGW" in + this series. I think that's OK. We did nothing sensible before + then (just compared raw bytes that had no hope of matching). At least + now the user will get some idea why their grep/log never matches in + that edge case. + I could also support the PCRE v1 backend here, but that would make the code more complex. I'd rather aim for simplicity here and in future changes to the diffcore. We're not going to have someone who -- 2.22.0.455.g172b71a6c5