Re: git as an sfc member project

Jeff King <peff@xxxxxxxx> · Sat, 23 Oct 2010 09:39:48 -0400

On Sat, Oct 23, 2010 at 11:52:26AM +0000, Ãvar ArnfjÃrÃ Bjarmason wrote:

> Either way it doesn't matter, since I'm not interested in being a SFC
> liasion. I just want to hack, not deal with issues like these (but
> more power to people who want to).

I didn't mean to pick on you, btw. It's just that I was surprised to see
you, whose first commit was only 6 months ago, in the list of top
contributors by lines of code. You're productive, but not _that_
productive. :)

As it turns out, even though Junio's numbers are doubled, you are in
fact high by line count. It's because of compat/regex:

  $ git log --pretty=format: --numstat --author=Bjarmason compat/regex/* |
    perl -ne '/^\d+/ and $total += $&; END { print "$total\n"; }'
  11186

which accounts for 85% of your contribution. :)

> But I think picking people for anything based on the number of lines
> that git-blame thinks people "own" is a bad criteria. My contributions
> to Git are relatively small, but I've happened to pick projects (the
> test suit, gettext) that have touched a lot of lines of code.
> 
> But other people who've done 10x more work than I have (both in time &
> importance) probably have 10x less lines of code assigned to them.

I think counting surviving lines via git-blame is not that bad a metric
for importance. It's certainly better than counting added lines (as I
did above), as it measures lines that people are actually still using.
The problem here is that we have quite large chunks of "uninteresting".
Junio made some attempt to account for this by counting various parts of
the codebase separately. Probably compat/ should have been removed from
the core count (ditto for Marius Storm-Olsen, whose line count is
inflated by importing nedmalloc; which isn't to say that any of these
contributions aren't important. They just aren't the same as sitting
down and writing 10,000 lines of custom git code).

In general, any line count of code (surviving or otherwise) will favor
people who are adding features rather than fixing bugs. I prefer commit
count, where I personally fare much better. :)

One interesting metric to me is the ratio of commit log lines to code
lines. A high ratio implies (to some degree) working on bugfixes, where
the actual changed lines of code are less important than the time you
spend figuring out _which_ lines to change.

You can measure it with something like:

  $ git log  --format='Author: %an%n%w(0,4,4)%B' --numstat --no-merges |
    perl -ne '
      if (/^Author: (.*)/) { $author = $1 }
      elsif (/^\s{4}.+/) { $commit{$author}++ }
      elsif (/^\d+/) { $code{$author} += $& }
      END {
        print($commit{$_} / $code{$_}, " $_\n")
          for grep { $code{$_} } keys(%code)
      }
    ' | sort -rn

Of course it has its own set of flaws. One giant feature contribution
can outweight a lot of bugfixes in the average.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html