Re: [PATCH v13 3/3] grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 19, 2021 at 8:08 AM René Scharfe <l.s.r@xxxxxx> wrote:
>
> Am 19.11.21 um 08:00 schrieb Ævar Arnfjörð Bjarmason:
> >
> > On Thu, Nov 18 2021, René Scharfe wrote:
> >
> >> Am 18.11.21 um 19:17 schrieb Junio C Hamano:
> >>> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
> >> [...]
> >>> I guess this is a lot of change in the amount of text involved but
> >>> the least amount of actual change in the behaviour.  For those with
> >>> newer PCREv2, the behaviour would be the same as v2.34.0, and for
> >>> others, the behaviour would be the same as v2.33.0.
> >>>
> >>> Having said all that, because the consensus seems to be that the
> >>> whole "when we should match in UTF mode" may need to be rethought, I
> >>> think reverting Hamza's [v13 3/3] would be the simplest way to clean
> >>> up the mess for v2.34.1 that will give us a cleaner slate to later
> >>> build on, than applying this patch.
> >>
> >> Makes sense to me.  It gives a better starting point to solve the issue
> >> afresh without getting entangled in mind-melting boolean expressions.
> >
> > Yes, agreed. As noted I haven't had time to dig deeply into this, but
> > from what I've seen so far there doesn't seem to be any obvious way
> > forward in terms of a quick fix.
> >
> > I thought perhaps your patch would be that (but I haven't looked into it
> > carefully enough), but since you're on-board with reverting & retrying.
>
> That patch should fix the edge case without any side-effects -- at least
> I haven't seen any reports of ill effects that would apply to it.

Since it isn't restricted to log, it will still cause a regression to
the `git grep` case with binary data for versions of PCRE2 older than
10.34 and unlike the previous one it might not trigger an error in the
testsuite just because we are missing a test for it.

> It's
> easier to understand and reason about when applied after reverting, I
> think.  But it's only for grep.c and I don't know the situation in t/.

We had been focusing in PCRE in this discussion, but I see the strict
behaviour of older PCRE2 as just a "coal mine canary" to point to the
bigger problem that we are expecting git's regex to handle safely and
correctly from the point of UTF, what is technically a binary match
with --color making the mismatch obvious.

The issue is not unique to PCRE, and seems Ævar also acknowledges[1]
that by seeing the same bug this was attempting to fix with probably
some version of glibc's ERE.  I suspect FreeBSD's (and derivatives) is
also broken and might be throwing REGILLSEQ errors as well, so I think
that it is better to revert the whole thing.

Carlo

[1] https://lore.kernel.org/git/211119.86r1bc4om5.gmgdl@xxxxxxxxxxxxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux