On Wed, Oct 16, 2024 at 11:48:24AM +0200, Wolfgang Müller wrote: > On 2024-10-16 10:57, Patrick Steinhardt wrote: > > Given that we do set `log.abbrev` I think we should be hitting code > > paths in git-shortlog(1) that use it. `git shortlog --format=%h` for > > example would use `log.abbrev`, wouldn't it? It would be nice to > > figure out whether this can be made to misbehave based on which object > > hash we have in the input. > > I dove into the code again and now I'm fairly sure custom formatting is > only ever done when in a repository. shortlog_output() itself, called at > the end of cmd_shortlog(), doesn't do any formatting, only possibly > wrapping the lines already present in the shortlog struct. > > That struct is filled either by read_from_stdin() or get_from_rev(). The > latter is only ever called when in a repository: > > [...] Thanks; I agree with your analysis here. > So whilst we parse all the relevant options like --abbrev and --format, > we take a shortcut through read_from_stdin() and never get to apply a > custom format. Commit hashes from stdin are discarded. > > I'm not sure a test case for different hash algorithms would test > anything meaningful here, unless the plan in the future is to have > git-shortlog(1) support formatting when reading from stdin. I think that in general it would be difficult to support the full range of --format specifiers when operating outside of a repository, because we don't have all of the information necessary to assemble all of the possible formatting options. For instance, let's say I want to take Patrick's example to test 'git shortlog' with '--format="%H"' outside of the repository. There's no way to disambiguate whether, say, a SHA-256 hash is either (a) a correctly formatted SHA-256 hash, or (b) a corrupted / too-long SHA-1 hash. So that means that '%H', '%h', '%T', and '%t' are off the table. '%an' and '%ae' seem reasonable to implement, but '%aN' and '%aE' less so, because we don't have a .mailmap file to read. The same goes for the committer variants of all of those. I don't think there is any reasonable interpretation of '%d'/'%D', and likewise for '%(decorate)' as well as '%(describe)'. We could probably go on, but I am getting tired of looking through the 'PRETTY FORMATS' section of git-log(1) and trying to figure out how they'd work (or not) without a repository ;-). In any event, my feeling is that while we could probably implement a handful of these formatting options, that it would likely be kind of awkward to do so. Not to mention the user-visible awkwardness of supporting some '--format' specifiers but not others[^1]. So I think that the best course of action would be to document the limitation and move on ;-). Thanks, Taylor [^1]: Playing devil's advocate, though, perhaps it is OK to document well which formatting options do and don't work, and accept that a user asking for '--format="%(describe)"' (etc.) outside of a repository is nonsensical and warn / return nothing appropriately.