What's in a name? Let's use a (uuid,name,email) triplet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Short Version:
-------------


Rather than use a (name,email) pair to identify people, let's use
a (uuid,name,email) triplet.

The uuid can be any piece of information that a user of git determines
to be reasonably unique across space and time and that is intended to
be used by that user virtually forever (at least within a project's
history).

For instance, the uuid could be an OSF DCE 1.1 UUID or the SHA-1 of
some easily remembered, already reasonably unique information.

This could really help keep identifications clean, and it is rather
straightforward and possibly quite efficient.


Long Version:
------------


There are 2 reasons why people contribute (pro bono) to projects:

  (0) To improve the project.
  (1) To garner recognition.

and in my experience, (0) is not as sweet without (1).

One of the great boons of distributed systems like git is that they
separate author (contributor) identities from committer identities,
thereby maintaining (some semblance of) proper attribution in an
official, structured format that is amenable to parsing by tools.

While git's use of (name,email) pairs to identify each person is
extremely practical, it turns out that it's rather `unstable';
consider the following information gleaned from a clone of the
official git repository:

    $ git shortlog -se origin/master | grep Linus
         3  Linus Torvalds <torvalds@xxxxxxxxxxxx>
       122  Linus Torvalds <torvalds@xxxxxxxxxxx>
       235  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
       276  Linus Torvalds <torvalds@xxxxxxxx>
         9  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxx(none)>
       439  Linus Torvalds <torvalds@xxxxxxxxxxxxxxx>
         9  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxxxxxxxx>

    $ git shortlog -se origin/master | grep Junio
      3658  Junio C Hamano <gitster@xxxxxxxxx>
         2  Junio C Hamano <junio@xxxxxxxxxxxxxxx>
         3  Junio C Hamano <junio@xxxxxxxxxx>
         3  Junio C Hamano <junio@xxxxxxxxx>
         8  Junio C Hamano <junio@xxxxxxxxxxx>
      4167  Junio C Hamano <junkio@xxxxxxx>
         2  Junio C Hamano <junkio@xxxxxxxxxxx>
         2  Junio Hamano <gitster@xxxxxxxxx>

or using a clone of Linus's Linux repo:

    $ git shortlog -se origin/master | grep Linus
         2  Linus Luessing <linus.luessing@xxxxxx>
         2  Linus Lüssing <linus.luessing@xxxxxx>
         2  Linus Nilsson <lajnold@xxxxxxxxxx>
         2  Linus Nilsson <lajnold@xxxxxxxxx>
        32  Linus Torvalds <torvalds@xxxxxxxxxxxx>
      1522  Linus Torvalds <torvalds@xxxxxxxxxxx>
      4174  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
         7  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxx>
         2  Linus Torvalds <torvalds@xxxxxxxxxxxxxx>
         8  Linus Torvalds <torvalds@xxxxxxxx>
         4  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxx(none)>
       166  Linus Torvalds <torvalds@xxxxxxxxxxxxxxx>
         1  Linus Torvalds <torvalds@xxxxxxxxxxxxx>
      1606  Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxxxxxxxx>
       174  Linus Torvalds <torvalds@xxxxxxxxxxxxxx>
         1  Linus Walleij (LD/EAB <linus.walleij@xxxxxxxxxxxx>
         3  Linus Walleij <linus.ml.walleij@xxxxxxxxx>
         1  Linus Walleij <linus.walleij@xxxxxxxxxxxx>
        81  Linus Walleij <linus.walleij@xxxxxxxxxxxxxx>
         9  Linus Walleij <triad@xxxxxxxxx>

    $ git shortlog -se origin/master | grep Morton
       581  Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
       836  Andrew Morton <akpm@xxxxxxxx>
         1  Andrew Morton <len.brown@xxxxxxxxx>

>From these few examples it seems pretty clear that the most volatile
portion of the (name,email) pair is the email, which is unfortunate
because the email is the most uniquely identifying information. Are
we really reasonably certain that these two are the same person?

    Linus Walleij <linus.ml.walleij@xxxxxxxxx>
    Linus Walleij <linus.walleij@xxxxxxxxxxxx>

Thus, I propose a more stable form of identification; rather than
using just a (name,email) pair, let's use a (uuid,name,email) triplet,
where the uuid can be any piece of information that a user of git
determines to be reasonably unique across space and time and that is
intended to be used by that user virtually forever (at least within a
project's history).

For instance, Linus is always stuck in his basement with the same
ancient computers, so he chooses to set up his few ~/.gitconfig
files with an OSF DCE 1.1 conforming UUID (generated by, say, uuidgen):

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

    [user]
        uuid  = 6b202ed1-e8ec-4048-84c2-ae0dd3b2df47
        name  = Linus Torvalds
        email = torvalds@xxxxxxxxxxxxxxxxxxxx

On the other hand, Junio is infatuated with the latest palmtop
computing gadgets and finds himself setting up a ~/.gitconfig file
several times each month; he doesn't want to bother remembering
some long human-hostile string, so he adopts as his uuid the
SHA-1 of some easily remembered piece of information like the
very first (name,email) pair that he used for git
(Junio C Hamano <junkio@xxxxxxx>):

    [user]
        uuid  = 6e99d26860f0b87ef4843fa838df2a918b85d1f7
        name  = Junio C Hamano
        email = gitster@xxxxxxxxx

I'm sure that some optimizations could made for certain choices like
UUID and SHA-1 strings.

Anyway, I think this could really help keep identifications clean,
and it is rather straightforward and possibly quite efficient.

Sincerely,
Michael Witten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]