Re: [PATCH] grep: skip UTF8 checks explicitally

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> The PCRE2_NO_UTF_CHECK flag means "I have checked that this is a valid
> UTF-8 string so you, PCRE, don't need to re-check it".

OK, in short, barfing and stopping is a problem, but that flag is
not the right knob to tweak.  And the right knob ...

>  1) We're oversupplying PCRE2_UTF now, and one such case is what's being
>     reported here. I.e. there's no reason I can think of for why a
>     fixed-string pattern should need PCRE2_UTF set when not combined
>     with --ignore-case. We can just not do that, but maybe I'm missing
>     something there.
>
>  2) We can do "try utf8, and fallback". A more advanced version of this
>     is what the new PCRE2_MATCH_INVALID_UTF flag (mentioned upthread)
>     does. I was thinking something closer to just carrying two compiled
>     patterns, and falling back on the ~PCRE2_UTF one if we get a
>     PCRE2_ERROR_UTF8_* error.

... lies somewhere along that line.  I think that is very sensible.
Let's make sure this gets sorted out soonish.

Thanks.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux