On Wed, Jul 13, 2022 at 3:39 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > On Tue, Jul 12 2022, Siddharth Asthana wrote: > > diff --git a/revision.c b/revision.c> > index 14dca903b6..6ad3665204 100644 > > --- a/revision.c > > +++ b/revision.c > > @@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt) > > if (!buf.len) > > strbuf_addstr(&buf, message); > > > > - commit_rewrite_person(&buf, commit_headers, opt->mailmap); > > + apply_mailmap_to_header(&buf, commit_headers, opt->mailmap); > > } > > > > /* Append "fake" message parts as needed */ > > I can live with this so far, but I really think this is cementing the > wrong approach into place here. > > We only use commit_match() to feed a commit to grep.c, which if you look > at the "header_field" struct there we take this pre-formatted output and > parse this out *again*, i.e. find "author", "reflog", "committer" etc., > and eventually point the regex engine at that buffer. > > So we really don't need to get a strbuf here, and munge the whole thing > in place to feed it to grep.c, instead we can: > > 1. Not munge it at all, pass it as-is > 2. Pass the mailmap along to grep.c itself > 3. It's already parsing out the headers, so at some point it will have > "author foo <bar>\n" > 4. In that code, we can just consult the mailmap, and then map the "foo > <bar>" bart to "Baz <bar>" or whatever > 5. Thean search that string. > > So no need for any in-place rewriting, or no? This patch series is about improving `git cat-file` and it seems to be far fetched to ask it to rewrite how grep handles mailmap first. > Even with this approach this seems a bit odd, e.g. isn't your > commit_rewrite_person() largely a re-invention of find_commit_header() > in commit.c, can't we use that function there? find_commit_header() seems to be searching for only one header, while we want to search for more than one. Also we want only one pass to be made over the object buffer. So I think we cannot really reuse find_commit_header(). > The replace_idents_using_mailmap() in 4/4 seems like it could be > improved in a similar way. > > I.e. can't we just loop over the the object, then as we find "author" > consult the mailmap, and potentially emit a replacement, otherwise the > existing content as-is up until the next \n etc. That's what we do except that we replace the existing ident instead of emitting a replacement. > We should be able to "stream" all of this, instead of in-place modifying > a potentially large commit buffer, which involves memmove() etc. I am not sure if streaming is really much better, especially if there are a small number of commit or tag objects where an ident must be replaced.