Short Version: ------------- Rather than use a (name,email) pair to identify people, let's use a (uuid,name,email) triplet. The uuid can be any piece of information that a user of git determines to be reasonably unique across space and time and that is intended to be used by that user virtually forever (at least within a project's history). For instance, the uuid could be an OSF DCE 1.1 UUID or the SHA-1 of some easily remembered, already reasonably unique information. This could really help keep identifications clean, and it is rather straightforward and possibly quite efficient. Long Version: ------------ There are 2 reasons why people contribute (pro bono) to projects: (0) To improve the project. (1) To garner recognition. and in my experience, (0) is not as sweet without (1). One of the great boons of distributed systems like git is that they separate author (contributor) identities from committer identities, thereby maintaining (some semblance of) proper attribution in an official, structured format that is amenable to parsing by tools. While git's use of (name,email) pairs to identify each person is extremely practical, it turns out that it's rather `unstable'; consider the following information gleaned from a clone of the official git repository: $ git shortlog -se origin/master | grep Linus 3 Linus Torvalds <torvalds@xxxxxxxxxxxx> 122 Linus Torvalds <torvalds@xxxxxxxxxxx> 235 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> 276 Linus Torvalds <torvalds@xxxxxxxx> 9 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxx(none)> 439 Linus Torvalds <torvalds@xxxxxxxxxxxxxxx> 9 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxxxxxxxx> $ git shortlog -se origin/master | grep Junio 3658 Junio C Hamano <gitster@xxxxxxxxx> 2 Junio C Hamano <junio@xxxxxxxxxxxxxxx> 3 Junio C Hamano <junio@xxxxxxxxxx> 3 Junio C Hamano <junio@xxxxxxxxx> 8 Junio C Hamano <junio@xxxxxxxxxxx> 4167 Junio C Hamano <junkio@xxxxxxx> 2 Junio C Hamano <junkio@xxxxxxxxxxx> 2 Junio Hamano <gitster@xxxxxxxxx> or using a clone of Linus's Linux repo: $ git shortlog -se origin/master | grep Linus 2 Linus Luessing <linus.luessing@xxxxxx> 2 Linus Lüssing <linus.luessing@xxxxxx> 2 Linus Nilsson <lajnold@xxxxxxxxxx> 2 Linus Nilsson <lajnold@xxxxxxxxx> 32 Linus Torvalds <torvalds@xxxxxxxxxxxx> 1522 Linus Torvalds <torvalds@xxxxxxxxxxx> 4174 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> 7 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxx> 2 Linus Torvalds <torvalds@xxxxxxxxxxxxxx> 8 Linus Torvalds <torvalds@xxxxxxxx> 4 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxx(none)> 166 Linus Torvalds <torvalds@xxxxxxxxxxxxxxx> 1 Linus Torvalds <torvalds@xxxxxxxxxxxxx> 1606 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxxxxxxxx> 174 Linus Torvalds <torvalds@xxxxxxxxxxxxxx> 1 Linus Walleij (LD/EAB <linus.walleij@xxxxxxxxxxxx> 3 Linus Walleij <linus.ml.walleij@xxxxxxxxx> 1 Linus Walleij <linus.walleij@xxxxxxxxxxxx> 81 Linus Walleij <linus.walleij@xxxxxxxxxxxxxx> 9 Linus Walleij <triad@xxxxxxxxx> $ git shortlog -se origin/master | grep Morton 581 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> 836 Andrew Morton <akpm@xxxxxxxx> 1 Andrew Morton <len.brown@xxxxxxxxx> >From these few examples it seems pretty clear that the most volatile portion of the (name,email) pair is the email, which is unfortunate because the email is the most uniquely identifying information. Are we really reasonably certain that these two are the same person? Linus Walleij <linus.ml.walleij@xxxxxxxxx> Linus Walleij <linus.walleij@xxxxxxxxxxxx> Thus, I propose a more stable form of identification; rather than using just a (name,email) pair, let's use a (uuid,name,email) triplet, where the uuid can be any piece of information that a user of git determines to be reasonably unique across space and time and that is intended to be used by that user virtually forever (at least within a project's history). For instance, Linus is always stuck in his basement with the same ancient computers, so he chooses to set up his few ~/.gitconfig files with an OSF DCE 1.1 conforming UUID (generated by, say, uuidgen): Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> [user] uuid = 6b202ed1-e8ec-4048-84c2-ae0dd3b2df47 name = Linus Torvalds email = torvalds@xxxxxxxxxxxxxxxxxxxx On the other hand, Junio is infatuated with the latest palmtop computing gadgets and finds himself setting up a ~/.gitconfig file several times each month; he doesn't want to bother remembering some long human-hostile string, so he adopts as his uuid the SHA-1 of some easily remembered piece of information like the very first (name,email) pair that he used for git (Junio C Hamano <junkio@xxxxxxx>): [user] uuid = 6e99d26860f0b87ef4843fa838df2a918b85d1f7 name = Junio C Hamano email = gitster@xxxxxxxxx I'm sure that some optimizations could made for certain choices like UUID and SHA-1 strings. Anyway, I think this could really help keep identifications clean, and it is rather straightforward and possibly quite efficient. Sincerely, Michael Witten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html