Re: [PATCH v4 3/4] ident: rename commit_rewrite_person() to apply_mailmap_to_header()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 13, 2022 at 3:39 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> On Tue, Jul 12 2022, Siddharth Asthana wrote:

> > diff --git a/revision.c b/revision.c> > index 14dca903b6..6ad3665204 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -3792,7 +3792,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
> >               if (!buf.len)
> >                       strbuf_addstr(&buf, message);
> >
> > -             commit_rewrite_person(&buf, commit_headers, opt->mailmap);
> > +             apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
> >       }
> >
> >       /* Append "fake" message parts as needed */
>
> I can live with this so far, but I really think this is cementing the
> wrong approach into place here.
>
> We only use commit_match() to feed a commit to grep.c, which if you look
> at the "header_field" struct there we take this pre-formatted output and
> parse this out *again*, i.e. find "author", "reflog", "committer" etc.,
> and eventually point the regex engine at that buffer.
>
> So we really don't need to get a strbuf here, and munge the whole thing
> in place to feed it to grep.c, instead we can:
>
>  1. Not munge it at all, pass it as-is
>  2. Pass the mailmap along to grep.c itself
>  3. It's already parsing out the headers, so at some point it will have
>     "author foo <bar>\n"
>  4. In that code, we can just consult the mailmap, and then map the "foo
>    <bar>" bart to "Baz <bar>" or whatever
>  5. Thean search that string.
>
> So no need for any in-place rewriting, or no?

This patch series is about improving `git cat-file` and it seems to be
far fetched to ask it to rewrite how grep handles mailmap first.

> Even with this approach this seems a bit odd, e.g. isn't your
> commit_rewrite_person() largely a re-invention of find_commit_header()
> in commit.c, can't we use that function there?

find_commit_header() seems to be searching for only one header, while
we want to search for more than one. Also we want only one pass to be
made over the object buffer. So I think we cannot really reuse
find_commit_header().

> The replace_idents_using_mailmap() in 4/4 seems like it could be
> improved in a similar way.
>
> I.e. can't we just loop over the the object, then as we find "author"
> consult the mailmap, and potentially emit a replacement, otherwise the
> existing content as-is up until the next \n etc.

That's what we do except that we replace the existing ident instead of
emitting a replacement.

> We should be able to "stream" all of this, instead of in-place modifying
> a potentially large commit buffer, which involves memmove() etc.

I am not sure if streaming is really much better, especially if there
are a small number of commit or tag objects where an ident must be
replaced.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux