Re: [PATCH v13 3/3] grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.11.21 um 10:38 schrieb Carlo Arenas:
> On Tue, Nov 16, 2021 at 1:30 AM Andreas Schwab <schwab@xxxxxxxxxxxxxx> wrote:
>>
>> expecting success of 7812.13 'PCRE v2: grep ASCII from invalid UTF-8 data':
>>         git grep -h "var" invalid-0x80 >actual &&
>>         test_cmp expected actual &&
>>         git grep -h "(*NO_JIT)var" invalid-0x80 >actual &&
>>         test_cmp expected actual
>>
>> ++ git grep -h var invalid-0x80
>> ++ test_cmp expected actual
>> ++ test 2 -ne 2
>> ++ eval 'diff -u' '"$@"'
>> +++ diff -u expected actual
>> ++ git grep -h '(*NO_JIT)var' invalid-0x80
>> fatal: pcre2_match failed with error code -22: UTF-8 error: isolated byte with 0x80 bit set
>
> That is exactly what I was worried about, this is not failing one
> test, but making `git grep` unusable in any repository that has any
> binary files that might be reachable by it, and it is likely affecting
> anyone using PCRE older than 10.34

Let's have a look at the map.  Here are the differences between the
versions regarding use of PCRE2_UTF:

o: opt->ignore_locale
h: has_non_ascii(p->pattern)
i: is_utf8_locale()
l: !opt->ignore_case && (p->fixed || p->is_fixed)

o h i l master hamza rene2
0 0 0 0      0     1     0
0 0 0 1      0     1     0
0 0 1 0      0     1     1
0 0 1 1      0     1     0  <== 7812.13, confirmed using fprint() debugging

So http://public-inbox.org/git/0ea73e7a-6d43-e223-ab2e-24c684102856@xxxxxx/
should not have this breakage, because it doesn't enable PCRE2_UTF for
literal patterns.

René




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux