Thomas Adam <thomas@xxxxxxxxxx> writes: > What I did was first of all ascertain the number of original lines in each of > the files I was interested in: > > for i in *.[ch] > do > c="$(git --no-pager blame "$i" | grep -c '^\^')" > [ $c -gt 0 ] && echo "$i:$c" > done | sort -t':' -k2 -nr Another approach I've used when I was curious how many among 1244 lines Linus originally wrote for Git in 2005 remains in today's codebase goes the other way [*1*]. The "reverse" approach makes use of the -S option of blame to fabricate a hypothetical history where the very initial version of Git is today's version, and then there is another version that was built on it (eh, rather reduced out of it) which is Linus's original. $ git tag initial e83c5163316f89 $ cat >fake-history <<EOF $(git rev-parse initial) $(git rev-parse master) $(git rev-parse master) EOF The list of files that Linus had in his original can befound out with: $ git ls-tree -r --name-only initial and you can iterate over them with a command like this: $ git blame -Sfake-history -s -b initial -- cache.h a brief commentary of the options: * "-Sfake-history" option points at a fake-history file, which uses the same format as the "graft" file, to establish the fake ancestry. The first line claims that the Linus's 'initial' version has only one parent, which is our current version 'master' (in reality, Linus's 'initial' version did not have any parent, of course). The second line claims that our current version 'master' is a root commit without any parent. * "-s" squelches all metainformation other than commit object name from the prefix of each line; "-b" further blanks out the commit object name of the "root" commit---note that in this fake history, our current state in 'master' is what is blanked out. The output may start like so: 1) #ifndef CACHE_H 2) #define CACHE_H 3) e83c5163316 4) #include <stdio.h> e83c5163316 5) #include <sys/stat.h> e83c5163316 6) #include <fcntl.h> e83c5163316 7) #include <stddef.h> The idea is that a line that is blamed to the "root" commit (i.e. blank prefix) is what survived since Linus's version down to our current version. In the fake world, Linus started from our today's version and ended up with the same result in his version for these lines. A line that is blamed to e83c516 is something we do not have in our today's version that is "added" by Linus in this fake world---that in reality is what we "lost" from Linus's original over time. By adding -M and -C on "git blame" command line, you'll find more lines that survived over time from Linus's original by getting moved around inside the same file and across file boundaries. By adding -w, indentation-only changes would also be ignored. I am not judging which is more correct to go in the forward direction like your approach does or to go in the reverse, as I haven't thought about it deeply enough. [Reference] *1* https://docs.google.com/file/d/0Bw3FApcOlPDhMFR3UldGSHFGcjQ/view Slide #11 was created using the above method.