Re: Compatibility between GNU and Git grep -P

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 21, 2023 at 2:11 PM Paul Eggert <eggert@xxxxxxxxxxx> wrote:
>
> In <https://lists.gnu.org/r/grep-devel/2023-04/msg00017.html> Carlo
> Marcelo Arenas Belón wrote:
>
> > After using this for a while think the following will be better suited
> > for a release because:
> >
> > * the unreleased PCRE2 code is still changing and is unlikely to be released
> >    for a couple of months.
> > * the current way to configure PCRE2 make it difficult to link with the
> >    unreleased code (this might be an independent bug), but it is likely that
> >    the wrong headers might be used by mistake.
> > * the tests and documentation were not completely accurate.

Just to clarify; those points were made about the GNU grep codebase, hence are
not really relevant about git's which had an independent thread[1] and
that will be better to use instead to avoid more confusion.

> Thanks for looking into this. I'm concerned about the resulting patches,
> though, because I see recent activity in on the Git grep -P side here:
>
> https://lore.kernel.org/git/xmqqzgaf2zpt.fsf@gitster.g/

This is really not that recent, and has been released already with git
2.40, so at least at that point in time git and grep 3.9 were
consistent.  That was changed with grep 3.10 though.

FWIW, it doesn't seem git had any issues (other than the crasher with
PCRE2 10.34) with the transition to matching multibyte digits with
'\d' and which is what perl (and therefore PCRE2) does, but as I
explained in the other thread I think it might be wise (on the context
of what is usually matched against with git) to not do that in the
long term, and was therefore working on adding the necessary features
to PCRE2 to be able to do so.  Note that no decision has been made
though, which is why I didn't even bother sending an RFC patch.

> Given Jim's strong desire that \d should match only ASCII digits, I
> doubt whether GNU grep will simply use PCRE2_UCP without
> PCRE2_EXTRA_ASCII_BSD.

My assumption is that you would also need PCRE2_EXTRA_ASCII_DIGIT, and
indeed bleeding edge pcre2grep[2] had a compatibility option added
assuming as much.

> Either way, we should see what the Git folks say about this.

The proposed patch for git would IMHO just cause the same risk I was
trying to prevent with my proposed change to GNU grep.

There are no plans to release PCRE2 10.43 and based on its regular
cadence wouldn't be for another couple of months, so this code is a
little premature and will need updating eitherway.

Carlo

[1] https://lore.kernel.org/git/2554712d-e386-3bab-bc6c-1f0e85d999db@xxxxxxxxxxx/
[2] https://github.com/PCRE2Project/pcre2/commit/3bbdb6dd713b39868934fdc978ba61b81da6d1c5




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux