[PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



compile_pcre2_pattern() currently uses the option PCRE2_UTF only for
patterns with non-ASCII characters.  Patterns with ASCII wildcards can
match non-ASCII strings, though.  Without that option PCRE2 mishandles
UTF-8 input, though -- it matches parts of multi-byte characters.  Fix
that by using PCRE2_UTF even for ASCII-only patterns.

This is a remake of the reverted ae39ba431a (grep/pcre2: fix an edge
case concerning ascii patterns and UTF-8 data, 2021-10-15).  The change
to the condition and the test are simplified and more targeted.

Original-patch-by: Hamza Mahfooz <someguy@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: René Scharfe <l.s.r@xxxxxx>
---
 grep.c                          | 2 +-
 t/t7812-grep-icase-non-ascii.sh | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index fe847a0111..5badb6d851 100644
--- a/grep.c
+++ b/grep.c
@@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
+	if (!opt->ignore_locale && is_utf8_locale() &&
 	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
 		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);

diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index e5d1e4ea68..ca3f24f807 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -123,4 +123,10 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE2,PCRE2_MATCH_INVALID_UTF 'PCRE v2: gr
 	test_cmp invalid-0xe5 actual
 '

+test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: grep non-literal ASCII from UTF-8' '
+	git grep --perl-regexp -h -o -e ll. file >actual &&
+	echo "lló" >expected &&
+	test_cmp expected actual
+'
+
 test_done
--
2.34.0




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux