Re: What's in a name? Let's use a (uuid,name,email) triplet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 18, 2010 at 5:27 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>
> On Thu, 18 Mar 2010, Michael Witten wrote:
>>
>> This is all that I'm saying: Keep git exactly the way it is, but add
>> one extra piece of identifying information for each person.
>
> The thing is, you don't seem to realize that most authorship is over
> email.
>
> Let's take some numbers from the kernel archive, for example. Here's _one_
> trivial way to count it:
>
>  - number of commits where author/committer email matches (presumably
>   _not_ emailed, although sometimes people commit their own patches that
>   were emailed around):
>
>        [torvalds@i5 linux]$ git log --no-merges "--pretty=format:%h-%ae%n%h-%ce" | uniq -d | wc
>          33473   33473  959167
>
>  - total number of commits:
>
>        [torvalds@i5 linux]$ git rev-list --no-merges HEAD | wc
>         176415  176415 7233015
>
> IOW, less than a fifth of the patches were done by the person who actually
> committed things. 80%+ of all changes were committed by somebody else than
> the author.
>
> How do you think the authorship information can be transferred sanely,
> considering that the author didn't even use git in the first place?
> Really?
>
> That's where the typos/mistakes/missing-info really happens. And it often
> starts out with incomplete information, because the person has a bad email
> setup, and the thing only has an email address to begin with, ie the
> "From:" might literally say just "tytso@xxxxxxx" or something (to pick an
> example from the Cc list in this discussion - when Ted sends real emails,
> they tend to have proper naming).

If I recall correctly the top source of errors is variations in the
domain name of the email address. Second place was mangling of names
from non-ASCII charsets. Third place was human typos. Fourth was
inconsistency in the human name, like Ted's example.

A really simple check would be for git to say - I've never seen this
name/email combo before, are you sure it is correct before I commit
it.

PS - I am not in favor of the UUID scheme.

>
> Sometimes we'll edit the messages to have the "From: xyz <abc>" thing at
> the top, fixing up the incomplete thing then. Typos happen there. Or the
> patch will simply come in two different ways, so there's no typo, yet
> there are two different emails that get author attribution.
>
> The thing is, development really is about human interaction. Yes, there's
> a tool involved (git), and once the data is in the tool we won't lose it
> any more, but this is about getting the data _into_ the tool in the first
> place.
>
> And the data you want to add simply DOES NOT EXIST. And we can't make it
> exist. The fact that even the trivial and obvious data that git _does_ ask
> for gets to be incomplete should tell you something.
>
>                        Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Jon Smirl
jonsmirl@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]