Hi, On Fri, 11 Jul 2008, Sverre Rabbelier wrote: > I temporarily modified the code to output %04d instead of %4d so that I > could do the following: > > $ stats.py author -a > full_activity_sortable.txt You might be delighted to read up on the "-n" switch to sort(1). > A few highlights from the sorted file: > > $ cat full_activity_sortable.txt | sort | tail -n 20 More intuitive would have been "sort -r | head -n 20", I guess. > 0170: 2721+ 1060- = refs.c I guess that 170 is the total number of commit touching that file, the "+" and "-" numbers the changes respectively? I think quite a lot of our changes do code moves; this should be accounted for differently. > 0172: 4369+ 2004- = builtin-pack-objects.c > 0177: 345+ 233- = GIT-VERSION-GEN > 0178: 2855+ 2121- = commit.c > 0178: 4779+ 2227- = fast-import.c > 0179: 2677+ 1400- = read-cache.c > 0185: 5661+ 2056- = builtin-apply.c > 0186: 3269+ 1255- = revision.c > 0213: 1884+ 460- = Documentation/config.txt > 0232: 2257+ 1621- = Documentation/git.txt > 0236: 3990+ 1991- = contrib/fast-import/git-p4 > 0281: 2753+ 2220- = git.c > 0333: 10259+ 7150- = git-gui.sh > 0338: 11337+ 6187- = git-svn.perl > 0338: 5755+ 3159- = sha1_file.c > 0397: 10230+ 9599- = diff.c > 0412: 23248+ 20257- = gitk > 0432: 10580+ 4502- = gitweb/gitweb.perl > 0490: 1412+ 619- = cache.h > 0977: 4703+ 2705- = Makefile > > $ cat Makefile | wc -l > 1482 > > For some reason you people can't seem to make up your mind about a > file that's not even 1500 lines in size ;). Heh. We might need to change it once or twice, in the future. > A note is in order here, this data was mined with "git log --num-stat" > so things like moving files and copying files are not accounted for. In my opinion it would be even more interesting to see code moves (i.e. not whole files). For example, we moved some stuff from builtins into the library. The real change here is not in the lines added and deleted. > I thought about using git-blame to gather this info before, but it is > not the right tool for the job. If anyone else has any idea's on what > would be better please let me know and I'll happily dig into it :). I think that you need to analyze the diff directly. One possible (quick 'n dirty) way would be to cut out long consecutive "+" parts of the hunks, replace the "-" by "+", and use "git diff --no-index" to do the hard part of searching for that code in the "-" part of the original diff. If that turns out to be useful, we can still think about a proper API using xdiff. Just an idea, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html