Re: [RFC PATCH v2] builtin/shortlog: explicitly set hash algo when there is no repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 16, 2024 at 11:48:24AM +0200, Wolfgang Müller wrote:
> On 2024-10-16 10:57, Patrick Steinhardt wrote:
> > Given that we do set `log.abbrev` I think we should be hitting code
> > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for
> > example would use `log.abbrev`, wouldn't it? It would be nice to
> > figure out whether this can be made to misbehave based on which object
> > hash we have in the input.
>
> I dove into the code again and now I'm fairly sure custom formatting is
> only ever done when in a repository. shortlog_output() itself, called at
> the end of cmd_shortlog(), doesn't do any formatting, only possibly
> wrapping the lines already present in the shortlog struct.
>
> That struct is filled either by read_from_stdin() or get_from_rev(). The
> latter is only ever called when in a repository:
>
> [...]

Thanks; I agree with your analysis here.

> So whilst we parse all the relevant options like --abbrev and --format,
> we take a shortcut through read_from_stdin() and never get to apply a
> custom format. Commit hashes from stdin are discarded.
>
> I'm not sure a test case for different hash algorithms would test
> anything meaningful here, unless the plan in the future is to have
> git-shortlog(1) support formatting when reading from stdin.

I think that in general it would be difficult to support the full range
of --format specifiers when operating outside of a repository, because
we don't have all of the information necessary to assemble all of the
possible formatting options.

For instance, let's say I want to take Patrick's example to test 'git
shortlog' with '--format="%H"' outside of the repository. There's no way
to disambiguate whether, say, a SHA-256 hash is either (a) a correctly
formatted SHA-256 hash, or (b) a corrupted / too-long SHA-1 hash.

So that means that '%H', '%h', '%T', and '%t' are off the table. '%an'
and '%ae' seem reasonable to implement, but '%aN' and '%aE' less so,
because we don't have a .mailmap file to read. The same goes for the
committer variants of all of those.

I don't think there is any reasonable interpretation of '%d'/'%D', and
likewise for '%(decorate)' as well as '%(describe)'. We could probably
go on, but I am getting tired of looking through the 'PRETTY FORMATS'
section of git-log(1) and trying to figure out how they'd work (or not)
without a repository ;-).

In any event, my feeling is that while we could probably implement a
handful of these formatting options, that it would likely be kind of
awkward to do so. Not to mention the user-visible awkwardness of
supporting some '--format' specifiers but not others[^1].

So I think that the best course of action would be to document the
limitation and move on ;-).

Thanks,
Taylor

[^1]: Playing devil's advocate, though, perhaps it is OK to document
  well which formatting options do and don't work, and accept that a
  user asking for '--format="%(describe)"' (etc.) outside of a
  repository is nonsensical and warn / return nothing appropriately.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux