On Thu, Mar 18, 2010 at 5:27 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > On Thu, 18 Mar 2010, Michael Witten wrote: >> >> This is all that I'm saying: Keep git exactly the way it is, but add >> one extra piece of identifying information for each person. > > The thing is, you don't seem to realize that most authorship is over > email. > > Let's take some numbers from the kernel archive, for example. Here's _one_ > trivial way to count it: > > - number of commits where author/committer email matches (presumably > _not_ emailed, although sometimes people commit their own patches that > were emailed around): > > [torvalds@i5 linux]$ git log --no-merges "--pretty=format:%h-%ae%n%h-%ce" | uniq -d | wc > 33473 33473 959167 > > - total number of commits: > > [torvalds@i5 linux]$ git rev-list --no-merges HEAD | wc > 176415 176415 7233015 > > IOW, less than a fifth of the patches were done by the person who actually > committed things. 80%+ of all changes were committed by somebody else than > the author. > > How do you think the authorship information can be transferred sanely, > considering that the author didn't even use git in the first place? > Really? > > That's where the typos/mistakes/missing-info really happens. And it often > starts out with incomplete information, because the person has a bad email > setup, and the thing only has an email address to begin with, ie the > "From:" might literally say just "tytso@xxxxxxx" or something (to pick an > example from the Cc list in this discussion - when Ted sends real emails, > they tend to have proper naming). If I recall correctly the top source of errors is variations in the domain name of the email address. Second place was mangling of names from non-ASCII charsets. Third place was human typos. Fourth was inconsistency in the human name, like Ted's example. A really simple check would be for git to say - I've never seen this name/email combo before, are you sure it is correct before I commit it. PS - I am not in favor of the UUID scheme. > > Sometimes we'll edit the messages to have the "From: xyz <abc>" thing at > the top, fixing up the incomplete thing then. Typos happen there. Or the > patch will simply come in two different ways, so there's no typo, yet > there are two different emails that get author attribution. > > The thing is, development really is about human interaction. Yes, there's > a tool involved (git), and once the data is in the tool we won't lose it > any more, but this is about getting the data _into_ the tool in the first > place. > > And the data you want to add simply DOES NOT EXIST. And we can't make it > exist. The fact that even the trivial and obvious data that git _does_ ask > for gets to be incomplete should tell you something. > > Linus > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jon Smirl jonsmirl@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html