blame -M vs. log -p|grep -c ^+ weirdness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all

I think I'm fundamentally misunderstanding something about the blame
code...  The other day I wanted to see how much our local fork of
DOMjudge diverged from their upstream.  You can grab the entire
history at

  git://csa.inf.ethz.ch/domjudge-public.git

if you want to try the commands I ran.

As a first statistic I looked at how many lines are blamed to our
local team (Christoph, Florian and me) by running

  git ls-files | while read f; do git blame -M -- "$f"; done |
  perl -pe 's/^\^?[a-f0-9]* (?:[^(]* )?\(([^2]*?) *20.*/$1/' |
  sort | uniq -c | sort -n

This shows that over 8000 lines are attributed to the three of us:

      1 domjudge                                                                   
      2 rob                                                                        
    113 Stijn van Drongelen                                                        
    126 Jeroen Schot                                                               
    149 neus                                                                       
    866 Peter van de Werken                                                        
   1245 Thomas Rast                                                                
   1752 Christoph Krautz                                                           
   5350 Florian Jug                                                                
  10293 Thijs Kinkhorst                                                            
  20397 Jaap Eldering   

However, sanity checking this against the diffs of the single commits
shows quite a different number:

  git log --no-merges -p upstream/2.2.. | grep '^+' | grep -v -c '^+++'

gives only 4943 '+' lines, and you can easily verify with

  git shortlog -sn upstream/2.2..

that indeed all commits in that range are ours.  So why does the blame
think more lines are ours than we even added *in total*?

Björn Steinbrink suggested on IRC that I use -M5 -C5 -C5 -C5, which
indeed reduces it to

      1 domjudge                                                                   
      2 rob                                                                        
    115 Stijn van Drongelen                                                        
    116 Jeroen Schot                                                               
    149 neus                                                                       
    390 Florian Jug                                                                
    930 Peter van de Werken                                                        
   1209 Thomas Rast                                                                
   1612 Christoph Krautz                                                           
  11750 Thijs Kinkhorst                                                            
  24020 Jaap Eldering

Note especially the huge drop in Florian's numbers.  What's going on
here?

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]