[GitStats] Bling bling or some statistics on the git.git repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[I sent this mail earlier, but I think vger rejected it due to the
size of the attachments, I have uploaded them instead now, they can be
found at:
http://alturin.googlepages.com/activity_per_author.txt
http://alturin.googlepages.com/full_activity.txt ]

Heya,

Today I sat down and finished the activity aggregation code. Now it is
possible to generate the attached files with the following commands:
$ stats.py author -e --id=an > activity_per_author.txt
$ stats.py author -a  > full_activity.txt

The first one calculates the activity of all developers on a per-file
basis and dumps it into the file. The "--id=an" switch sets the
grouping field to "%an" (see man git-log), since the default (%ae) is
not that helpful for git.git (I don't know people by their e-mail, I
know them by their name). This was already possible (with "author
-d"), but before one had to pick a specific developer, now it will
show for _all_ developers.This is interesting stuff, although for a
huge project like git it's a bit much to take in. What is probably
more interesting is the second command, it shows how much change a
file has had in it's existence.
I temporarily modified the code to output %04d instead of %4d so that
I could do the following:
$ stats.py author -a  > full_activity_sortable.txt

A few highlights from the sorted file:

$ cat full_activity_sortable.txt | sort | tail -n 20
0170:  2721+  1060- = refs.c
0172:  4369+  2004- = builtin-pack-objects.c
0177:   345+   233- = GIT-VERSION-GEN
0178:  2855+  2121- = commit.c
0178:  4779+  2227- = fast-import.c
0179:  2677+  1400- = read-cache.c
0185:  5661+  2056- = builtin-apply.c
0186:  3269+  1255- = revision.c
0213:  1884+   460- = Documentation/config.txt
0232:  2257+  1621- = Documentation/git.txt
0236:  3990+  1991- = contrib/fast-import/git-p4
0281:  2753+  2220- = git.c
0333: 10259+  7150- = git-gui.sh
0338: 11337+  6187- = git-svn.perl
0338:  5755+  3159- = sha1_file.c
0397: 10230+  9599- = diff.c
0412: 23248+ 20257- = gitk
0432: 10580+  4502- = gitweb/gitweb.perl
0490:  1412+   619- = cache.h
0977:  4703+  2705- = Makefile

$ cat Makefile | wc -l
1482

For some reason you people can't seem to make up your mind about a
file that's not even 1500 lines in size ;). With almost a thousand
edits so far, it's been edited so many times it could've been written
from scratch three times (except that the amount of lines deleted
doesn't match). Also interesting to note is that the "external" files
such as gitweb, gitk, git-gui and git-svn make up the bulk of all
changes. The two contenders from the native git camp are diff.c and
sha1_file.c which both have a lot of LOC. This information is
interesting for GitStats as it might help determine which files have
had a lot of change, and which files are not touched a lot.

A note is in order here, this data was mined with "git log --num-stat"
so things like moving files and copying files are not accounted for. I
thought about using git-blame to gather this info before, but it is
not the right tool for the job. If anyone else has any idea's on what
would be better please let me know and I'll happily dig into it :).

--
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux