[Changed subject] On Fri, Sep 10 2021, Gwyneth Morgan wrote: > On 2021-09-10 14:02:36+0100, Fangyi Zhou wrote: >> Similar to a35b13fce0 (Update .mailmap, 2018-11-09). >> >> This patch makes the output of `git shortlog -nse v2.10.0..master` >> duplicate-free by taking/guessing the current and preferred >> addresses for authors that appear with more than one address. > > The line for Jessica Clarke should probably just be > > Jessica Clarke <jrtc27@xxxxxxxxxx> > > That works the same and doesn't put a reference to an old name. It does work exactly the same! More specifically this is an unintentional bug/misfeature/looseness in the .mailmap parser, an entry like: Foo <foo@xxxxxxxxxxx> Bar Is exactly equivalent to: Foo <foo@xxxxxxxxxxx> I.e. we simply ignore the " Bar" part. The reason for this is that we're internally treating nonsense input as if the line simply ended there. Even having documented and tested some of this recently in 05b5ff219c2 (mailmap doc + tests: add better examples & test them, 2021-01-12) I found this a bit surprising. I probably found out at the time, but forgot and had to go source spelunking again. I'd expect: Foo <foo@xxxxxxxxxxx> Bar To be an alias/shorthand for: Foo <foo@xxxxxxxxxxx> Bar <foo@xxxxxxxxxxx> Which is something that might be applicable / useful in some cases. E.g. a name might change over time from "Foo", to "Bar", to "Zar", but just because we're at "Bar" and want to map "Foo" to "Bar", that might not mean that we'd like to map any future name at the same address (i.e. the future "Zar") to the same "Foo". In practice I suspect that's more commonly what people do want to do, maybe we should warn about it, I did mean to hook some pedantic mode of the parser at some point up to git-fsck. More annoying is that this: New <foo@xxxxxxxxxxx> <bar@xxxxxxxxxxx> <foo@xxxxxxxxxxx> <zar@xxxxxxxxxxx> Doesn't mean the same as: New <foo@xxxxxxxxxxx> <bar@xxxxxxxxxxx> New <foo@xxxxxxxxxxx> <zar@xxxxxxxxxxx> I.e. I'd expect the name to map to the empty string, *unless* we saw an earlier address, i.e. just as we do for the first bar -> foo line (we map it to a name of "New", we don't map it to an empty name). So that's some #leftoverbits, perhaps someone somewhere relies on that, but it seems like an obvious shorthand to have. I can't imagine it being useful to map to empty names, and much of e.g. git.git's mailmap is repeated entries with the same name over and over again. I suppose we could also extend it to new syntax such as: New <foo@xxxxxxxxxxx> <bar@xxxxxxxxxxx> <zar@xxxxxxxxxxx> Doing that would be strictly backwards compatible, i.e. now we'll entirely ignore the 3rd E-Mail address. It does mean we also accidentally support things like: New <foo@xxxxxxxxxxx> <bar@xxxxxxxxxxx> # A comment, because we ignore everything after the 2nd address But don't tell anyone I told you that :) But that is something that might technically have inadvertently closed the door to future syntax extensions, but we could probably do them anyway, or at worst have some heuristic. Another useful thing might be to support: New <> Old <> As an explicit mapping of the name "Old" wherever we see it to "New", or: New <> Old <> To change just the name "Old" to "New" everywhere, without considering the E-Mail address. Both of those are probably too crazy to be useful, especially since if we supported that we'd logically also support: New <> <> To assign all the commits to the name "New", but retain the address.