Re: [PATCH] grep: skip UTF8 checks explicitally

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 25 Jul 2019 06:02:54 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

>> OK, in short, barfing and stopping is a problem, but that flag is
>> not the right knob to tweak.  And the right knob ...
>>
>> >  1) We're oversupplying PCRE2_UTF now, and one such case is what's being
>> >     reported here. I.e. there's no reason I can think of for why a
>> >     fixed-string pattern should need PCRE2_UTF set when not combined
>> >     with --ignore-case. We can just not do that, but maybe I'm missing
>> >     something there.
>> >
>> >  2) We can do "try utf8, and fallback". A more advanced version of this
>> >     is what the new PCRE2_MATCH_INVALID_UTF flag (mentioned upthread)
>> >     does. I was thinking something closer to just carrying two compiled
>> >     patterns, and falling back on the ~PCRE2_UTF one if we get a
>> >     PCRE2_ERROR_UTF8_* error.
>>
>> ... lies somewhere along that line.  I think that is very sensible.
>
> I am glad that everybody agrees with my original comment on ab/no-kwset
> where I suggested that we should use our knowledge of the encoding of
> the haystack and convert it to UTF-8 if we detect that the pattern is
> UTF-8 encoded,...

Please do not count me among "everybody", then.  I did not think
that Ævar meant to iconv the haystack when I wrote the message you
are responding to, but if that was what he meant, I would not have
said "very sensible".