Re: [PATCH 7/7] shortlog: implement `--group=trailer` in terms of `--group=<format>`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 10, 2022 at 08:34:21PM -0400, Taylor Blau wrote:

> In the same spirit as the previous commit, reimplement
> `--group=trailer:<key>` as a special case of `--group=<format>`, too.
> 
> Unsurprisingly, this reimplementation is a little bit more complicated
> than the previous two. The complexity stems from having to enumerate
> each of the trailers one-by-one, as well as delegating control to
> `parse_ident()` to handle whether or not to show an individual's email.
> 
> To enumerate each trailer in a commit, we set the separator used in the
> pretty format to be a NUL byte. This hack allows us to treat the strbuf
> from `format_commit_message()` as an array of strings.

Hmph. So that's _probably_ OK, but I have to wonder if this is going too
far. The resulting code is not that much shorter, and IMHO is actually a
little more complicated, because of this hack and the extra util bit.

I also would be curious if there is any speed difference between the
two.

I spent a fair bit of time optimizing the regular author/committer paths
(and I think if we restore the skip-the-dedup logic I mentioned earlier
for those, they should be equivalent, as they already use
format_commit_message()). But the internals of the trailer code are
already so slow I doubt it makes much of a difference either way here.

The one thing that could hurt is that multiple trailers now require
multiple format calls, which may load the commit object for each one. I
think _probably_ not because we'd hopefully have it cached in the commit
slab via the save_commit_buffer mechanism. Though I guess if we used the
commit graph in the first place, we would need to load it. We'd also
likely reencode, but that's a noop on utf8 commits, so it's probably not
worth worrying about.

-Peff

PS That last paragraph gives me the suspicion that there's some
   low-hanging optimization fruit here: we want save_commit_buffer on in
   general, but after we visit each commit, we probably should call
   free_commit_buffer() on it to get rid of it, like we do for git-log.
   Probably shortlog is keeping all of the commit messages in memory at
   once (though of course not if the commit-graph is in use).

   If the commit-graph _is_ in use, there's probably the opposite
   optimization we could be doing: we should make sure to load it once,
   then do all of the formats, and then unuse it.

   Those don't have to be part of your series, but might be worth
   looking into while you're in the area.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux