On Thu, Apr 29, 2010 at 07:12:27PM -0400, Jay Soffian wrote: > Let's say you've got a repo with ~ 40K files and 35K commits. > Well-packed .git is about 800MB. > > You want to find out how many lines of code a particular group of > individuals has contributed to HEAD. > > The naive solution is to run git blame on all 40K files grep'ing for > the just the authors you want. With the exception of your "blame only those files that you know your authors have touched" optimization, I think you pretty much have to do this. Anything else will just be reimplementing blame. You can't throw away most content prematurely, because it may end up blaming to your authors of interest eventually. I think this is also what Junio ended up doing when presenting at GitTogether '08: http://userweb.kernel.org/~junio/200810-Chron.pdf In theory you might be able to do multi-file blame faster. I would be curious to see the performance difference between: $ git blame file1 file2 ;# not actually implemented and $ for i in file1 file2; do git blame $i; done Much of the work is O(content), but there is some overlap in walking the history and generating diffs. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html