On Wed, Sep 22 2021, Jeff King wrote: > On Tue, Sep 21, 2021 at 11:02:31PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> On Tue, Sep 21 2021, Jeff King wrote: >> >> > On Tue, Sep 21, 2021 at 09:37:23AM +0200, René Scharfe wrote: >> > >> >> > @@ -965,9 +953,12 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, >> >> > bol += len; >> >> > switch (p->field) { >> >> > case GREP_HEADER_AUTHOR: >> >> > - case GREP_HEADER_COMMITTER: >> >> > - strip_timestamp(bol, &eol); >> >> > + case GREP_HEADER_COMMITTER: { >> >> > + char *em = memrchr(bol, '>', eol - bol); >> >> > + if (em) >> >> > + eol = em + 1; >> >> >> >> The old code documents the intent via the function name. The new one >> >> goes into the nitty-gritty without further explanation, which I find >> >> harder to read. >> > >> > Agreed. I do think the conversion is functionally correct, but it >> > doesn't strike me as worth the change. >> >> As far as some general improvement in thish area it seems to me that >> this whole subthread is losing the forest for the trees. > > I'd definitely agree with that. :) > >> It probably makes sense to split up that commit.c code into something >> that can give you structured output, i.e. headers with types and >> start/end points for interesting data, then in grep.c we won't need a >> strip_anything(), or strrchr() or memrchr() or whatever. > > I don't disagree with any of this, either, but I think it's a separate > (and much more complicated) topic than what this series is dealing with. > So my preference would be to take this as an immediate improvement, and > let anything like that get built on top. > >> It would also be a lot faster for grepping if we could offload more of >> this work to the regex engine, particularly if we've got a more capable >> engine like PCREv2. >> >> In many cases we're splitting lines ourselves, when we could have the >> engine work in a multi-line mode, or translate the user's --author match >> into something that can match the raw commit header. So have an implicit >> /^author /m anchor if we're matching author headers, instead of having >> grep.c string-twiddle that around one line at a time. > > Ditto here. Multi-line matching may make things a lot more efficient, > but I think is out of scope for this series. That seems like an > interesting area for future work. Indeed, *way* outside this series. I'm I think the change you suggested here makes sense for this change, just pointing out that for any follow-up changes it's much more worthwhile to consider a few more stackframes than just the 1-2 that involve those strings in that particular form.