Re: Allow "git shortlog" to group by committer information

Jeff King <peff@xxxxxxxx> · Tue, 11 Oct 2016 15:17:13 -0400

On Tue, Oct 11, 2016 at 12:07:40PM -0700, Linus Torvalds wrote:

> On Tue, Oct 11, 2016 at 12:01 PM, Jeff King <peff@xxxxxxxx> wrote:
> >
> > My implementation is a little more complicated because it's also setting
> > things up for grouping by trailers (so you can group by "signed-off-by",
> > for example). I don't know if that's useful to your or not.
> 
> Hmm. Maybe in theory. But probably not in reality - it's just not
> unique enough (ie there are generally multiple, and if you choose the
> first/last, it should be the same as author/committer, so it doesn't
> actually add anything).

The implementation I did credited each commit multiple times if the
trailer appeared more than once. If you want to play with it, you can
fetch it from:

  git://github.com/peff jk/shortlog-ident

and then something like:

  git shortlog --ident=reviewed-by --format='...reviewed %an'

works. I haven't found it to really be useful for more than toy
statistic gathering, though.

> There are possibly other things that *could* be grouped by and might be useful:
> 
>  - main subdirectory it touches (I've often wanted that)
> 
>  - rough size of diff or number of files it touches
> 
> but realistically both are painful enough that it probably doesn't
> make sense to do in some low-level helper.

Yeah, I think there's a lot of policy there in what counts as "main",
the rough sizes, etc. I've definitely done queries like that before, but
usually by piping "log --numstat" into perl. It's a minor pain to get
the data into perl data structures, but once you have it, you have a lot
more flexibility in what you can compute.

That might be aided by providing more structured machine-readable output
from git, like JSON (which I don't particularly like, but it's kind-of a
standard, and it sure as hell beats XML). But obviously that's another
topic entirely.

> > I'm fine with this less invasive version, but a few suggestions:
> >
> >  - do you want to call it --group-by=committer (with --group-by=author
> >    as the default), which could later extend naturally to other forms of
> >    grouping?
> 
> Honestly, it's probably the more generic one, but especially for
> one-off commands that aren't that common, it's a pain to write. When
> testing it, I literally just used "-c" for that reason.

It's not the end of the world to call it "-c" now, and later define "-c"
as a shorthand for "--group-by=committer", if and when the latter comes
into existence.

Keep in mind that shortlog takes arbitrary revision options, too, and
"-c" is defined there for combined diffs. I can't think of a good reason
to want to pass it to shortlog, though, so I don't think it's a big
loss.

-Peff