Re: [PATCH v4 2/2] grep/pcre2: better support invalid UTF-8 haystacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 24 2021, Ramsay Jones wrote:

> On 24/01/2021 13:53, Ramsay Jones wrote:
> [snip]
>
>>> diff --git a/grep.c b/grep.c
>>> index efeb6dc58d..e329f19877 100644
>>> --- a/grep.c
>>> +++ b/grep.c
>>> @@ -492,7 +492,13 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>>>  	}
>>>  	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
>>>  	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
>>> -		options |= PCRE2_UTF;
>>> +		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
>>> +
>>> +	if (PCRE2_MATCH_INVALID_UTF &&
>>> +	    options & (PCRE2_UTF | PCRE2_CASELESS) &&
>>> +	    !(PCRE2_MAJOR >= 10 && PCRE2_MAJOR >= 36))
>>                                    ^^^^^^^^^^^^^^^^^^
>> I assume that this should be s/_MAJOR/_MINOR/. ;-)
>> 

Oops on the s/MAJOR/MINOR/g. Well spotted, I think I'll wait a bit more
for other comments for a re-roll.

Perhaps Junio can be kind and do the s/_MAJOR/_MINOR/ fixup in the
meantime to save be from spamming the list too much...

FWIW I have tested this on a verion without PCRE2_MATCH_INVALID_UTF, but
I think I did that by manually editing the "PCRE2_UTF" line above, and
then wrote this bug.

> Although, perhaps you want:
>
>             !(((PCRE2_MAJOR * 100) + PCRE2_MINOR) >= 1036)
>
> ... or something similar.

Probably better to use pcre2_config(PCRE2_CONFIG_VERSION) at that point
and versioncmp() the string.

FWIW we used this pattern (I added) before with PCRE, see e87de7cab4
(grep: un-break building with PCRE < 8.32, 2017-05-25).

And if you want to be clever ((PCRE2_MAJOR) | ((PCRE2_MINOR) << 16)) is
probably better, then you can get it out with bit operations :)





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux