Re: [RFC PATCH 2/2] docs: document a format for anonymous author and committer IDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 19 2022, brian m. carlson wrote:

> The original design of Git embeds a personal name and email in every
> commit.  This has lots of downsides, including the following.
>
> First, people do not want to bake an email into an immutable Merkle tree
> that they send everywhere.  Spam, whether in general or by recruiters,
> is a problem, and even when it's not, people change companies or
> institutions and emails become invalid.
>
> Second, some people prefer to operate anonymously and don't want to
> specify personal details everywhere.
>
> Third, and most important, people change names.  This happens for many
> reasons, but it comes up most saliently for transgender people, who
> frequently change their name as part of their transition.  Referring to
> a transgender person's former name, their "deadname", is considered
> inappropriate.
>
> We have a solution that can map former personal names and emails into
> current ones, the mailmap.  However, this last case poses a problem,
> because we don't really want to correlate the person's deadname (or
> their email, which may contain their deadname) right next to their
> current name.
>
> Several solutions have been proposed for this case, including hashing or
> encoding the old information, but these are all easily invertible.
> Instead, let's propose a new form of identifier which is opaque and some
> mailmap improvements to store the mailmap information outside of the
> main history.

With you so far...

> Propose that users use the fingerprint of a cryptographic key as part of
> a special-form email which is not valid according to RFC 1123, but is
> accepted by earlier versions of Git.  Now that we have SSH signing and
> OpenSSH is available on all major platforms, creating a unique ID is as
> easy as running ssh-keygen.  This approach results in an identifier
> which is unique, deterministic, and completely anonymous.

...but...

> Propose this new option instead of using a name and email, although
> users can continue to use those as before if they prefer. Continue to
> associate personal information with this opaque identifier using the
> mailmap, but in such a way that it lives in a special ref outside of the
> history and that ref is customarily kept squashed to a single commit.
> Create a special RFC 5322 header to associate a mailmap entry with the
> user's opaque identifier when sending a patch if desired.

...while it's technically neat, I really don't see why this whole
hashing mechanism is a necessary prerequisite to get to this point.

Wouldn't we get the same thing if *by convention* we just supported
authorship like this, (which we already support):

	UUID=$(get-some-uuid)
        git config user.name X
        git config user.email $UUID.uuid.git.example.org

So you'd end up with e.g.:

	X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>

Or whatever, we could bikeshed about the format, but the point is that
it's not codifying *how* that looks.

We'd then just support this refs/mailmap mechanism you're suggesting,
where we'd have a mapping like:

      Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> X <98ab8d66-38d2-11ed-a261-0242ac120002.uuid.git.example.com>

Which could be force-pushed.

I can see why you'd *also* want to formalize the ID generation, but I
just don't see why we'd want to make that as one leaping change rather
than something more incremental.

I.e. even if you don't have opaque IDs in the first place this mechanism
would allow you to maintain a "mailmap" ref on the remote, which would
already be useful.

E.g. now if I use a hosting provider and have my .mailmap in various
repo I need to maintain then in each repo, but this would allow for a
magical ref which would keep it up-to-date in various repos...

> [...]If a user would like to preserve a history
> +for some reason, they can use `--use-mailmap=commit`.  For maintainers, they can
> +then push this ref using the normal push refspecs, or explicitly with
> +`--mailmap`, which is equivalent to `+refs/mailmap:refs/mailmap`.

I obviously see why you want the "force push" aspect of this (the
deadnaming), but I still wonder if it's really a good trade-off for git
as an SCM to make that the default.

We've been going in the other direction for e.g. tags semi-recently with
my 0bc8d71b99e (fetch: stop clobbering existing tags without --force,
2018-08-31).

By having that force-push default we make it so that a plumbing command
(that makes use of mailmap) will give you one result today, but a
different one tomorrow, with no easy way to get back.

Maybe it's something we want in the end, but it's another thing that's
"changed while at it", i.e. not only are we introducing "mailmap" remote
refs, but also:

 * Changing the many-to-many mapping of history-mailmap to a
   many-to-one, i.e. the map is per-repo, not per-ref.

 * Changing it so that you can't track is as part of your history.

If we wanted to ease into just one of those we could have a "mailmap"
tag object, which we wouldn't clobber by default....





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux