On Tue, Sep 21 2021, Jeff King wrote: > On Tue, Sep 21, 2021 at 09:37:23AM +0200, René Scharfe wrote: > >> > @@ -965,9 +953,12 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, >> > bol += len; >> > switch (p->field) { >> > case GREP_HEADER_AUTHOR: >> > - case GREP_HEADER_COMMITTER: >> > - strip_timestamp(bol, &eol); >> > + case GREP_HEADER_COMMITTER: { >> > + char *em = memrchr(bol, '>', eol - bol); >> > + if (em) >> > + eol = em + 1; >> >> The old code documents the intent via the function name. The new one >> goes into the nitty-gritty without further explanation, which I find >> harder to read. > > Agreed. I do think the conversion is functionally correct, but it > doesn't strike me as worth the change. As far as some general improvement in thish area it seems to me that this whole subthread is losing the forest for the trees. We're operating a buffer that has "\n"'s in it, and then we loop and split it up one line at a time, often just to match author OR committer. We then have things like commit_rewrite_person() in revision.c that's preparing a "fake" commit just for grep.c's use, just for it to strip those headers down again. A real improvement in this area IMO would be trying to bridge the gap between parse_commit_buffer() and grep.c's match_one_pattern(). I.e. we've even parsed this once before in commit.c's parse_commit_date() by the time it gets to grep.c, now we're just doing it again. It probably makes sense to split up that commit.c code into something that can give you structured output, i.e. headers with types and start/end points for interesting data, then in grep.c we won't need a strip_anything(), or strrchr() or memrchr() or whatever. It would also be a lot faster for grepping if we could offload more of this work to the regex engine, particularly if we've got a more capable engine like PCREv2. In many cases we're splitting lines ourselves, when we could have the engine work in a multi-line mode, or translate the user's --author match into something that can match the raw commit header. So have an implicit /^author /m anchor if we're matching author headers, instead of having grep.c string-twiddle that around one line at a time. Just my 0.02.