Re: [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Thu, 22 Sep 2022 00:08:27 +0000

On 2022-09-20 at 10:51:39, Ævar Arnfjörð Bjarmason wrote:
> Wouldn't we get the same thing if *by convention* we just supported
> authorship like this, (which we already support):
> 
> 	UUID=$(get-some-uuid)
>         git config user.name X
>         git config user.email $UUID.uuid.git.example.org

You can indeed use a UUID if you want.  However, it's not deterministic.

Using a key hash also means account linking is trivially implemented in
forges.  If we use a UUID, then there's no way to prove ownership of the
identifier, which means that people can claim other people's commits.
Signed commits don't help here because you can't embed arbitrary
non-emails in X.509 (or in OpenPGP, because nobody will certify such an
ID), so you have no way of linking the commit identity to the key and
therefore signed commits are worse than before.  At least with an email
you can verify that the owner of the account owns the email address, but
you can't do that with a UUID.

I want a design that works whether or not you use a forge, but realizing
that most developers use forges these days, I want to make the workflow as
simple and straightforward as possible for those who do.  I also want a
design which is going to be acceptable to forge implementers, and
working for one, I think this design is going to be easier to implement
and more likely to be accepted than an ID which requires extra work and
isn't verifiable.

For ease of use, I would be implementing tooling to make setting this
from an existing user.signingkey or SSH key on the system.  I literally
envision this being as simple as something like `git id --set -f
~/.ssh/id_ed25519` or `git id --set --generate-ssh-key`.  (This is just
an example; we can argue about the details later.)

> So you'd end up with e.g.:
> 
> 	X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>
> 
> Or whatever, we could bikeshed about the format, but the point is that
> it's not codifying *how* that looks.

I do very much want to codify how this looks because people are
absolutely going to rely on it, whether we want them to or not.  People
already parse GitHub's fake no-reply emails for information.  Everything
that Git does people rely on, whether we like it or not.

Keeping it in the form of an email maximizes compatibility for existing
implementations.

> We'd then just support this refs/mailmap mechanism you're suggesting,
> where we'd have a mapping like:
> 
>       Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>
> 
> Which could be force-pushed.
> 
> I can see why you'd *also* want to formalize the ID generation, but I
> just don't see why we'd want to make that as one leaping change rather
> than something more incremental.

We can make it as incremental as folks want.  However, the longer we
have people embedding their real names and emails in an immutable Merkle
tree, the longer we're going to run into deadname problems.  Thus,
encouraging this new form of ID sooner means that people will adopt it
sooner.

If this is the only impediment, we can make it more gradual.

> I.e. even if you don't have opaque IDs in the first place this mechanism
> would allow you to maintain a "mailmap" ref on the remote, which would
> already be useful.
> 
> E.g. now if I use a hosting provider and have my .mailmap in various
> repo I need to maintain then in each repo, but this would allow for a
> magical ref which would keep it up-to-date in various repos...

That's part of the goal.

> I obviously see why you want the "force push" aspect of this (the
> deadnaming), but I still wonder if it's really a good trade-off for git
> as an SCM to make that the default.
> 
> We've been going in the other direction for e.g. tags semi-recently with
> my 0bc8d71b99e (fetch: stop clobbering existing tags without --force,
> 2018-08-31).
> 
> By having that force-push default we make it so that a plumbing command
> (that makes use of mailmap) will give you one result today, but a
> different one tomorrow, with no easy way to get back.

I think force-pushing semantics has a nicer behaviour for my use case,
but it's not essential.  If the mailmap is in a separate ref, then if I
work at $MEGACORP and need to update the mailmap because of a name
change, I can still just rewrite the history, and as long as we preserve
the force-fetch behaviour by default, then it will just work.

I _do_ think we should retain the force-fetch behaviour by default.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature