[I sent this mail earlier, but I think vger rejected it due to the size of the attachments, I have uploaded them instead now, they can be found at: http://alturin.googlepages.com/activity_per_author.txt http://alturin.googlepages.com/full_activity.txt ] Heya, Today I sat down and finished the activity aggregation code. Now it is possible to generate the attached files with the following commands: $ stats.py author -e --id=an > activity_per_author.txt $ stats.py author -a > full_activity.txt The first one calculates the activity of all developers on a per-file basis and dumps it into the file. The "--id=an" switch sets the grouping field to "%an" (see man git-log), since the default (%ae) is not that helpful for git.git (I don't know people by their e-mail, I know them by their name). This was already possible (with "author -d"), but before one had to pick a specific developer, now it will show for _all_ developers.This is interesting stuff, although for a huge project like git it's a bit much to take in. What is probably more interesting is the second command, it shows how much change a file has had in it's existence. I temporarily modified the code to output %04d instead of %4d so that I could do the following: $ stats.py author -a > full_activity_sortable.txt A few highlights from the sorted file: $ cat full_activity_sortable.txt | sort | tail -n 20 0170: 2721+ 1060- = refs.c 0172: 4369+ 2004- = builtin-pack-objects.c 0177: 345+ 233- = GIT-VERSION-GEN 0178: 2855+ 2121- = commit.c 0178: 4779+ 2227- = fast-import.c 0179: 2677+ 1400- = read-cache.c 0185: 5661+ 2056- = builtin-apply.c 0186: 3269+ 1255- = revision.c 0213: 1884+ 460- = Documentation/config.txt 0232: 2257+ 1621- = Documentation/git.txt 0236: 3990+ 1991- = contrib/fast-import/git-p4 0281: 2753+ 2220- = git.c 0333: 10259+ 7150- = git-gui.sh 0338: 11337+ 6187- = git-svn.perl 0338: 5755+ 3159- = sha1_file.c 0397: 10230+ 9599- = diff.c 0412: 23248+ 20257- = gitk 0432: 10580+ 4502- = gitweb/gitweb.perl 0490: 1412+ 619- = cache.h 0977: 4703+ 2705- = Makefile $ cat Makefile | wc -l 1482 For some reason you people can't seem to make up your mind about a file that's not even 1500 lines in size ;). With almost a thousand edits so far, it's been edited so many times it could've been written from scratch three times (except that the amount of lines deleted doesn't match). Also interesting to note is that the "external" files such as gitweb, gitk, git-gui and git-svn make up the bulk of all changes. The two contenders from the native git camp are diff.c and sha1_file.c which both have a lot of LOC. This information is interesting for GitStats as it might help determine which files have had a lot of change, and which files are not touched a lot. A note is in order here, this data was mined with "git log --num-stat" so things like moving files and copying files are not accounted for. I thought about using git-blame to gather this info before, but it is not the right tool for the job. If anyone else has any idea's on what would be better please let me know and I'll happily dig into it :). -- Cheers, Sverre Rabbelier -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html