Re: [PATCH v4 2/2] grep/pcre2: better support invalid UTF-8 haystacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 24 2021, Ramsay Jones wrote:

> On 24/01/2021 14:49, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Sun, Jan 24 2021, Ramsay Jones wrote:
>> 
>>> On 24/01/2021 13:53, Ramsay Jones wrote:
>>> [snip]
>>>
>>>>> diff --git a/grep.c b/grep.c
>>>>> index efeb6dc58d..e329f19877 100644
>>>>> --- a/grep.c
>>>>> +++ b/grep.c
>>>>> @@ -492,7 +492,13 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>>>>>  	}
>>>>>  	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
>>>>>  	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
>>>>> -		options |= PCRE2_UTF;
>>>>> +		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
>>>>> +
>>>>> +	if (PCRE2_MATCH_INVALID_UTF &&
>>>>> +	    options & (PCRE2_UTF | PCRE2_CASELESS) &&
>>>>> +	    !(PCRE2_MAJOR >= 10 && PCRE2_MAJOR >= 36))
>>>>                                    ^^^^^^^^^^^^^^^^^^
>>>> I assume that this should be s/_MAJOR/_MINOR/. ;-)
>>>>
>> 
>> Oops on the s/MAJOR/MINOR/g. Well spotted, I think I'll wait a bit more
>> for other comments for a re-roll.
>> 
>> Perhaps Junio can be kind and do the s/_MAJOR/_MINOR/ fixup in the
>> meantime to save be from spamming the list too much...
>
> Umm, sorry for not making myself clear, _just_ changing MAJOR to
> MINOR is insufficient.
>
>> 
>> FWIW I have tested this on a verion without PCRE2_MATCH_INVALID_UTF, but
>> I think I did that by manually editing the "PCRE2_UTF" line above, and
>> then wrote this bug.
>
> Yep, I seem to have 10.34 on Linux Mint 20.1 (based on Ubuntu 20.04).
>
>> 
>>> Although, perhaps you want:
>>>
>>>             !(((PCRE2_MAJOR * 100) + PCRE2_MINOR) >= 1036)
>>>
>>> ... or something similar.
>> 
>> Probably better to use pcre2_config(PCRE2_CONFIG_VERSION) at that point
>> and versioncmp() the string.
>
> OK, but it needs to be 'something similar' (try putting, say, MAJOR 11
> and MINOR 0->35 in your expression).

Ah yes, of course. I re-rolled a v5 with a fix for that. Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux