Re: [PATCH v3 00/10] grep: move from kwset to optional PCRE v2

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 01 Jul 2019 14:31:49 -0700

Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

> This v3 has a new patch (3/10) that I believe fixes the regression on
> MinGW Johannes noted in
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1907011515150.44@xxxxxxxxxxxxxxxxx/
>
> As noted in the updated commit message in 10/10 I believe just
> skipping this test & documenting this in a commit message is the least
> amount of suck for now. It's really an existing issue with us doing
> nothing sensible when the log/grep haystack encoding doesn't match the
> needle encoding supplied via the command line.

Is that quite the case?  If they do not match, not finding the match
is the right answer, because we are byte-for-byte matching/searching
IIUC.

> We swept that under the carpet with the kwset backend, but PCRE v2
> exposes it.

Is it exposing, or just showing the limitation of the rewritten
implementation where it cannot do byte-for-byte matching/searching
as we used to be able to?

Without having a way to know what encoding is used on the command
line, there is no sensible way to reencode them to match the
haystack encoding (even when it is known), so "you got to feed the
strings in the same encoding, as we are going to match/search
byte-for-byte" is the only sensible way to work, given the design
space, I would think.

Not that it is all that useful to be able to match/search
byte-for-byte, of course, so I am OK if we punt with these tests,
but I'd prefer to see us admit we are punting when we do ;-).