[PATCH v3 00/10] grep: move from kwset to optional PCRE v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This v3 has a new patch (3/10) that I believe fixes the regression on
MinGW Johannes noted in
https://public-inbox.org/git/nycvar.QRO.7.76.6.1907011515150.44@xxxxxxxxxxxxxxxxx/

As noted in the updated commit message in 10/10 I believe just
skipping this test & documenting this in a commit message is the least
amount of suck for now. It's really an existing issue with us doing
nothing sensible when the log/grep haystack encoding doesn't match the
needle encoding supplied via the command line.

We swept that under the carpet with the kwset backend, but PCRE v2
exposes it.

Ævar Arnfjörð Bjarmason (10):
  log tests: test regex backends in "--encode=<enc>" tests
  grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"
  t4210: skip more command-line encoding tests on MinGW
  grep: inline the return value of a function call used only once
  grep tests: move "grep binary" alongside the rest
  grep tests: move binary pattern tests into their own file
  grep: make the behavior for NUL-byte in patterns sane
  grep: drop support for \0 in --fixed-strings <pattern>
  grep: remove the kwset optimization
  grep: use PCRE v2 for optimized fixed-string search

 Documentation/git-grep.txt                    |  17 +++
 grep.c                                        | 115 +++++++---------
 grep.h                                        |   3 +-
 revision.c                                    |   3 +
 t/t4210-log-i18n.sh                           |  41 +++++-
 ...a1.sh => t7008-filter-branch-null-sha1.sh} |   0
 ...08-grep-binary.sh => t7815-grep-binary.sh} | 101 --------------
 t/t7816-grep-binary-pattern.sh                | 127 ++++++++++++++++++
 8 files changed, 234 insertions(+), 173 deletions(-)
 rename t/{t7009-filter-branch-null-sha1.sh => t7008-filter-branch-null-sha1.sh} (100%)
 rename t/{t7008-grep-binary.sh => t7815-grep-binary.sh} (55%)
 create mode 100755 t/t7816-grep-binary-pattern.sh

Range-diff:
 1:  cfc01f49d3 =  1:  cfc01f49d3 log tests: test regex backends in "--encode=<enc>" tests
 2:  4b59eb32f0 =  2:  4b59eb32f0 grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"
 -:  ---------- >  3:  676c76afe4 t4210: skip more command-line encoding tests on MinGW
 3:  cc4d3b50d5 =  4:  da9b491f70 grep: inline the return value of a function call used only once
 4:  d9b29bdd89 =  5:  c42d3268fa grep tests: move "grep binary" alongside the rest
 5:  f85614f435 =  6:  36b9c1c541 grep tests: move binary pattern tests into their own file
 6:  90afca8707 =  7:  3c54e782e6 grep: make the behavior for NUL-byte in patterns sane
 7:  526b925fdc =  8:  8e5f418189 grep: drop support for \0 in --fixed-strings <pattern>
 8:  14269bb295 =  9:  d1cb8319d5 grep: remove the kwset optimization
 9:  c0fd75d102 ! 10:  4de0c82314 grep: use PCRE v2 for optimized fixed-string search
    @@ -15,6 +15,15 @@
         makes the behavior harder to understand and document, and makes tests
         for the different backends more painful.
     
    +    This does change the behavior under non-C locales when "log"'s
    +    "--encoding" option is used and the heystack/needle in the
    +    content/command-line doesn't have a matching encoding. See the recent
    +    change in "t4210: skip more command-line encoding tests on MinGW" in
    +    this series. I think that's OK. We did nothing sensible before
    +    then (just compared raw bytes that had no hope of matching). At least
    +    now the user will get some idea why their grep/log never matches in
    +    that edge case.
    +
         I could also support the PCRE v1 backend here, but that would make the
         code more complex. I'd rather aim for simplicity here and in future
         changes to the diffcore. We're not going to have someone who
-- 
2.22.0.455.g172b71a6c5




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux