Re: Oddidies in the .mailmap parser & future syntax extensions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 10 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> [Changed subject]
>
> [jc: culled CC addresses]
>
>> I'd expect:
>>
>>     Foo <foo@xxxxxxxxxxx> Bar
>>
>> To be an alias/shorthand for:
>>
>>     Foo <foo@xxxxxxxxxxx> Bar <foo@xxxxxxxxxxx>
>
> OK.
>
>> More annoying is that this:
>>
>>     New <foo@xxxxxxxxxxx> <bar@xxxxxxxxxxx>
>>     <foo@xxxxxxxxxxx> <zar@xxxxxxxxxxx>
>>
>> Doesn't mean the same as:
>> ...
>> I.e. I'd expect the name to map to the empty string, *unless* we saw an
>> earlier address, i.e. just as we do for the first bar -> foo line (we
>> map it to a name of "New", we don't map it to an empty name).
>
> You expect the first one to map (anyname, <bar@xxxxxxxxxxx>) to
> ("New", <foo@xxxxxxxxxxx>) and you describe the second one does not
> map the human-readable part to "New", but it is unclear what the
> code does, or why you expect it to map to "" (or what your
> expectation is, for that matter, exactly---do you want an empty
> string, or do you want "New", or something else???).
>
> FWIW, if we were designing it from scratch, I'd expect the second
> one to map (anyname, <zar@xxxxxxxxxxx>) to ($1, <foo@xxxxxxxxxxx>),
> keeping the human-readable part as-is and only map the e-mail part.

Yes, that's what it does now. I.e. we'll create two mappings from those
lines, namely (in line-by-line regex-like pseudosyntax):

    [[(.*), bar@xxxxxxxxxxx>] => [New, foo@xxxxxxxxxxx]]
    [[(.*), zar@xxxxxxxxxxx>] => [$1, foo@xxxxxxxxxxx]]

I.e. for the second one we'll retain whatever name we found there.

> Or do you expect that when these two entries appear together, the
> first entry with "New" is carried over to the second entry?

Yes, I'd expect it to be stateful and implicitly do:

    [[(.*), bar@xxxxxxxxxxx>] => [New, foo@xxxxxxxxxxx]]
    [[(.*), zar@xxxxxxxxxxx>] => [find_last_name_for_address_in_mailmap(find_last_explicit_) || $1, foo@xxxxxxxxxxx]]

I.e. we'd map the entries for you in git.git like:
    
    diff --git a/.mailmap b/.mailmap
    index 9c6a446bdfb..2a884981e9d 100644
    --- a/.mailmap
    +++ b/.mailmap
    @@ -127,8 +127,8 @@ Julian Phillips <julian@xxxxxxxxxxxxxxxxx> <jp3@xxxxxxxxxxxxxxxxx>
     Junio C Hamano <gitster@xxxxxxxxx> <gitster@xxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junkio@xxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junkio@xxxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junkio@xxxxxxx>
    +<gitster@xxxxxxxxx> <junkio@xxxxxxxxxxx>

Which are currently mostly redundant, except for one old commit under
junio@xxxxxxxxxxx where the " C " wasn't present in your name field.

One might say that's a feature, and you'd like to be explicit about when
to map addresses and when to map names, i.e. if we were trying to
minimize the size of the .mailmap then this would be the most minimal
thing we can get away with currently:
    
    diff --git a/.mailmap b/.mailmap
    index 9c6a446bdfb..4668f9b32f2 100644
    --- a/.mailmap
    +++ b/.mailmap
    @@ -127,8 +127,8 @@ Julian Phillips <julian@xxxxxxxxxxxxxxxxx> <jp3@xxxxxxxxxxxxxxxxx>
     Junio C Hamano <gitster@xxxxxxxxx> <gitster@xxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junio@xxxxxxxxx>
     Junio C Hamano <gitster@xxxxxxxxx> <junio@xxxxxxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junkio@xxxxxxx>
    -Junio C Hamano <gitster@xxxxxxxxx> <junkio@xxxxxxxxxxx>
    +<gitster@xxxxxxxxx> <junkio@xxxxxxx>
    +<gitster@xxxxxxxxx> <junkio@xxxxxxxxxxx>

But I just can't imagine cases where the proposed shorthand isn't useful
for everyone.

Who wants to use mailmap, *and* map one e-mail address to another, *and*
has an entry explicitly mapping the name, but *would* mind having git be
auto-smart and follow the chain of that explicit name mapping if there's
an entry after that with an an e-mail -> that-earlier-email mapping?

I think the answer is "nobody" and it would be unambiguously helpful,
but maybe I'm wrong.
    
>> Doing that would be strictly backwards compatible, i.e. now we'll
>> entirely ignore the 3rd E-Mail address. It does mean we also
>> accidentally support things like:
>>
>>     New <foo@xxxxxxxxxxx> <bar@xxxxxxxxxxx> # A comment, because we ignore everything after the 2nd address
>>
>> But don't tell anyone I told you that :) But that is something that
>> might technically have inadvertently closed the door to future syntax
>> extensions, but we could probably do them anyway, or at worst have some
>> heuristic.
>
> I vaguely recall that it was not an accident but a deliberate
> feature to allow comments, but don't tell anyone I told you that.

Which works, right until someone has an old name entry that looks like a
comment :)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux