On Mon, Sep 20 2021, Jeff King wrote: > While discussing [1], I noticed that the grep code mostly takes > non-const buffers, even though it is conceptually a read-only operation > to search in them. The culprit is a handful of spots that temporarily > tie off NUL-terminated strings by overwriting a byte of the buffer and > then restoring it. But I think we no longer need to do so these days, > now that we have a regexec_buf() that can take a ptr/size pair. > > The first three patches are a bit repetitive, but I broke them up > individually because they're the high-risk part. I.e., if my assumptions > about needing the NUL are wrong, it could introduce a bug. But based on > my reading of the code, plus running the test suite with ASan/UBSan, I > feel reasonably confident. > > The last two are the bigger cleanups, but should obviously avoid any > behavior changes. > > [1/5]: grep: stop modifying buffer in strip_timestamp > [2/5]: grep: stop modifying buffer in show_line() > [3/5]: grep: stop modifying buffer in grep_source_1() > [4/5]: grep: mark "haystack" buffers as const > [5/5]: grep: store grep_source buffer as const > > grep.c | 87 +++++++++++++++++++++++++++++----------------------------- > grep.h | 4 +-- > 2 files changed, 45 insertions(+), 46 deletions(-) > > -Peff > > [1] https://lore.kernel.org/git/YUk3zwuse56v76ze@xxxxxxxxxxxxxxxxxxxxxxx/ This whole thing looks good to me. I only found a small whitespace nit in one of the patches. Did you consider following-up by having this code take const char*/const size_t pairs. E.g. starting with something like the below. When this API is called it's called like that, and the regex functions at the bottom expect that, but we have all the bol/eol twiddling in the middle, which is often confusing because some functions pass the pointers along as-is, and some modify them. So not for now, but I think rolling with what I started here below would make sense for this file eventually: diff --git a/grep.c b/grep.c index 14fe8a0fd23..f55ec5c0e09 100644 --- a/grep.c +++ b/grep.c @@ -436,7 +436,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt } } -static int pcre2match(struct grep_pat *p, const char *line, const char *eol, +static int pcre2match(struct grep_pat *p, const char *line, const size_t len, regmatch_t *match, int eflags) { int ret, flags = 0; @@ -448,11 +448,11 @@ static int pcre2match(struct grep_pat *p, const char *line, const char *eol, if (p->pcre2_jit_on) ret = pcre2_jit_match(p->pcre2_pattern, (unsigned char *)line, - eol - line, 0, flags, p->pcre2_match_data, + len, 0, flags, p->pcre2_match_data, NULL); else ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line, - eol - line, 0, flags, p->pcre2_match_data, + len, 0, flags, p->pcre2_match_data, NULL); if (ret < 0 && ret != PCRE2_ERROR_NOMATCH) { @@ -909,15 +909,15 @@ static void show_name(struct grep_opt *opt, const char *name) } static int patmatch(struct grep_pat *p, - const char *line, const char *eol, + const char *line, const size_t len, regmatch_t *match, int eflags) { int hit; if (p->pcre2_pattern) - hit = !pcre2match(p, line, eol, match, eflags); + hit = !pcre2match(p, line, len, match, eflags); else - hit = !regexec_buf(&p->regexp, line, eol - line, 1, match, + hit = !regexec_buf(&p->regexp, line, len, 1, match, eflags); return hit; @@ -976,7 +976,7 @@ static int match_one_pattern(struct grep_pat *p, } again: - hit = patmatch(p, bol, eol, pmatch, eflags); + hit = patmatch(p, bol, eol - bol, pmatch, eflags); if (hit && p->word_regexp) { if ((pmatch[0].rm_so < 0) || @@ -1447,7 +1447,7 @@ static int look_ahead(struct grep_opt *opt, int hit; regmatch_t m; - hit = patmatch(p, bol, bol + *left_p, &m, 0); + hit = patmatch(p, bol, *left_p, &m, 0); if (!hit || m.rm_so < 0 || m.rm_eo < 0) continue; if (earliest < 0 || m.rm_so < earliest)