RE: Dealing with corporate email recycling

<rsbecker@xxxxxxxxxxxxx> · Sun, 13 Mar 2022 15:47:22 -0400

On March 13, 2022 1:53 PM, I wrote:
>To: 'brian m. carlson' <sandals@xxxxxxxxxxxxxxxxxxxx>; 'Sean Allred'
><allred.sean@xxxxxxxxx>
>Cc: git@xxxxxxxxxxxxxxx; sallred@xxxxxxxx; grmason@xxxxxxxx;
>sconrad@xxxxxxxx
>Subject: RE: Dealing with corporate email recycling
>
>On March 13, 2022 1:22 PM, brian m. carlson wrote:
>>On 2022-03-12 at 22:38:56, Sean Allred wrote:
>>> * Proposal: UUIDs
>>>
>>> To get what we want (i.e., the ability to run `git show HEAD~1`, know
>>> that Ada wrote it, and report her current contact information), we
>>> need some way of tracking identity over time.  A naive solution could
>>> be to extend the mailmap format as recognized by Git:
>>>
>>>     $ git cat-file blob HEAD~1:.mailmap
>>>     A. U. Thor <foo@xxxxxxxxxxx> [uuid A] <ada@xxxxxxxxxxx>
>>>
>>>     $ git cat-file blob HEAD:.mailmap
>>>     A. U. Thor <ada@xxxxxxxxxxx> [uuid A]
>>>     Roy G. Biv <foo@xxxxxxxxxxx> [uuid B] <roy@xxxxxxxxxxx>
>>>
>>> Now, when I run `git show HEAD~1`, Git would determine the UUID of
>>> the email on the commit using the mailmap version in that tree:
>>>
>>>     $ git -c mailmap.blob=HEAD~1:.mailmap check-mailmap --uuid
>>"<foo@xxxxxxxxxxx>"
>>>     A
>>>
>>> Then, we can use that UUID to resolve to the current contact information:
>>>
>>>     $ git check-mailmap --uuid=A
>>>     A. U. Thor <ada@xxxxxxxxxxx>
>>>
>>> Mailmap-sensitive commands can use this logic internally -- possibly
>>> guarded under some new config setting.
>>
>>It's my intention to implement an approach where people's emails are
>>identified by a key fingerprint of some sort and then converted into
>>the proper email address by a mailmap that lives outside of the main
>>history.  That is, my email address might be
>>ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad@ssh-
>>sha256.ns.git-scm.com,
>>and then we have a mailmap that converts between the two.  If you
>>wanted to have a UUID-based one, you could do 77c747a3-1599-4c8c-9569-
>>f729c17632e6@xxxxxxxxxxxxxxxxxxx (assuming that namespace were registered).
>>
>>The benefit to the key part is that you can essentially prove that you
>>are who you say you are.  A UUID doesn't have the possibility.
>>
>>This was discussed briefly at some sort of contributor summit we had at
>>some point, but I've been busy and haven't gotten to it yet.  It is on
>>my list of projects, however.
>
>This could require a global and security hardened tokenization or signing approach.
>Email fingerprints from one organization would have to be able to move to
>another organization easily - potentially as part of the git repo's metadata. I would
>not use the same key as is used for signing fingerprints (mostly out of paranoia),
>but this is conceptually similar to the public side of a key-pair. One would have to
>have access to the private key in order to be a committer/author. Unfortunately,
>as it stands today, that may be easily spoofed (--committer, --author), so that part
>of the code would have to change with safeguards on what can be supplied -
>something I think would be welcome. Keeping with a distributed philosophy is
>probably essential. Just my take on it.

What about abstracting this into a map-email or map-identity hook of some kind? So, whenever there is a need to write an identity (committer, author, signed-off-by, etc.). That way, anyone who wants to, can implement whatever policy they want for replacing emails with some other value in the repo, and back again. It might be good to optimize it so that the hook is only invoked once per identity per request so that git log does not become insanely expensive.

Something like map-identity from <internal-value>  and map-identity to <external-value>, for example:

map-identity from "Randall S. Becker <rsbecker@xxxxxxxxxxxxx>"      > A056AAB2123

And

map-identity to A056AAB2123      >  Randall S. Becker <rsbecker@xxxxxxxxxxxxx>

Again, just a notion.