[PATCH v2] grep: skip UTF8 checks explicitly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was released
with git 2.10 added the PCRE_UTF8 flag to PCRE1 matching including a
call to has_non_ascii() to try to avoid breakage if there was non-utf8
encoded content in the haystack.

Usually PCRE is compiled with JIT support (even if is not the default),
and therefore the codepath used includes calling pcre_jit_exec, which
skips UTF-8 validation by design (which might result in crashes or hangs)
but when JIT support wasn't compiled we use pcre_exec instead with the
posibility that grep might be aborted if invalid UTF-8 is found in the
haystack.

PCRE1 provides a flag since Mar 5, 2007 that could be used to skip the
checks explicitly so use that to make both codepaths equivalent (the
flag is ignored by pcre1_jit_exec)

this fix is only implemented for PCRE1 because PCRE2 is likely to have
a better solution (without the risks) instead in the future

Helped-by: Johannes Schindelin <Johannes.Schindelin@xxxxxx>
Helped-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
Suggested-by: Junio C Hamano <gitster@xxxxxxxxx>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@xxxxxxxxx>
---
V2:
* drop PCRE2 support
* add backward compatibility define

 grep.c | 4 ++--
 grep.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/grep.c b/grep.c
index f7c3a5803e..69ef69516e 100644
--- a/grep.c
+++ b/grep.c
@@ -421,7 +421,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 		regmatch_t *match, int eflags)
 {
-	int ovector[30], ret, flags = 0;
+	int ovector[30], ret, flags = PCRE_NO_UTF8_CHECK;
 
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
diff --git a/grep.h b/grep.h
index 1875880f37..9c8797a017 100644
--- a/grep.h
+++ b/grep.h
@@ -3,6 +3,9 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include <pcre.h>
+#ifndef PCRE_NO_UTF8_CHECK
+#define PCRE_NO_UTF8_CHECK 0
+#endif
 #ifdef PCRE_CONFIG_JIT
 #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
 #ifndef NO_LIBPCRE1_JIT
-- 
2.23.0



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux