Compatibility between GNU and Git grep -P

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In <https://lists.gnu.org/r/grep-devel/2023-04/msg00017.html> Carlo Marcelo Arenas Belón wrote:

After using this for a while think the following will be better suited
for a release because:

* the unreleased PCRE2 code is still changing and is unlikely to be released
   for a couple of months.
* the current way to configure PCRE2 make it difficult to link with the
   unreleased code (this might be an independent bug), but it is likely that
   the wrong headers might be used by mistake.
* the tests and documentation were not completely accurate.

Thanks for looking into this. I'm concerned about the resulting patches, though, because I see recent activity in on the Git grep -P side here:

https://lore.kernel.org/git/xmqqzgaf2zpt.fsf@gitster.g/

Bleeding-edge (i.e., "master") GNU grep uses PCRE2_UCP | PCRE2_EXTRA_ASCII_BSD with unreleased PCRE2 (which introduces PCRE2_EXTRA_ASCII_BSD), and it uses neither flag with the current PCRE2 release. You're proposing to change GNU grep to never use either flag, regardless of PCRE2 release.

In contrast, bleeding-edge (i.e., "next") Git grep -P always uses PCRE2_UCP and never uses PCRE2_EXTRA_ASCII_BSD. I.e., it disagrees with GNU grep regardless of whether your proposed changes were adopted.

Given Jim's strong desire that \d should match only ASCII digits, I doubt whether GNU grep will simply use PCRE2_UCP without PCRE2_EXTRA_ASCII_BSD.

If we want the two grep -P's to stay compatible, I see two ways forward:

1. Leave GNU grep alone and modify Git grep to behave like GNU grep (see attached patch to Git).

2. Adopt your proposed change to GNU grep, and revert the recent change to Git grep so that it never uses PCRE2_UCP.

Either way, we should see what the Git folks say about this.
From 5f5e54157a01c540bde02c305c8ee5e1a39d4f1c Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@xxxxxxxxxxx>
Date: Fri, 21 Apr 2023 14:06:25 -0700
Subject: [PATCH] grep: be compatible with GNU grep -P

Use PCRE2_UCP only when PCRE2_EXTRA_ASCII_BSD is defined,
for compatibility with GNU grep.
---
 grep.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/grep.c b/grep.c
index 073559f2cd..e9dc8dc0bc 100644
--- a/grep.c
+++ b/grep.c
@@ -320,8 +320,13 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if (!opt->ignore_locale && is_utf8_locale() && !literal)
-		options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
+	if (!opt->ignore_locale && is_utf8_locale() && !literal) {
+		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
+#ifdef PCRE2_EXTRA_ASCII_BSD
+		/* Be compatible with GNU grep -P '\d'.  */
+		options |= (PCRE2_UCP | PCRE2_EXTRA_ASCII_BSD);
+#endif
+	}
 
 #ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
 	/*
-- 
2.39.2


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux