Ryan Anderson <ryan@xxxxxxxxxxxxxx> writes: > On Thu, Oct 05, 2006 at 01:13:15AM -0700, Junio C Hamano wrote: >> It's been a while since we lost git_blame from %actions list. I >> am wondering maybe it's time to remove it, after 1.4.3 happens. > > I certainly have no objection. In fact, I sent a patch a moment ago. > (I didn't keep the cc: on it, I figured there was too high a chance of > mishap when pasting the cc: list.) So it's finally settled between annotate and blame. It is kind of sad to see one of them had to go while these stem from slightly different algorithm sketches [*1*]. But for 8 months of its existence, it served us well as the git-cvsserver backend. May it rest in peace. Having said that, there are a few things in git-blame that interested people may want to further look into. Annotation by git-blame is done by "passing the blame to parents" principle. You start from the final form of the blob, and compare it with its counterpart in the parent version (rename detection is used to pick which file in the parent version to compare against). The lines the commit inherited from its parent are not responsibility of the child so the algorithm passes blame on them to the parent. The lines the commit changed from the parent are blamed on the child. When this is done, the parent "temporarily" takes responsibility for those lines that child did not change -- it just becomes "suspect" for those lines when we compare parent and child. And then the algorithm goes further down the ancestry to give the parent the chance to exonerate itself by passing blames for the lines it is suspect for, by passing the blame to its parent. When sifting the lines into "inherited" and "our responsibility", internally git-blame runs "diff", which expresses the changes as "these lines are deleted and these are inserted by the child". Lines outside are clearly inherited from the parent. This has an interesting effect on blame output. Suppose the original file had two groups of lines; group A followed by group B. A commit changes the file so that it has group B followed by group A. What git-blame sees as diff between the two is either: -A B +A or +B A -B In either case, it would end up giving blame to the child for one group (the first diff blames the child for A lines) and pass the blame for the other one to the parent. If we used something other than "diff" (Delete Insert File vs File ;-)), that expresses changes as "these are moved from there, these are inserted anew" (call that "miff"), then we should be able to assign blame more accurately. The above example case would be expressed as "group A came from the top part of the parent, group B came from the bottom part of the parent". Passing of the blame based on that expression would blame the child for neither group of lines. Further, if we use "ciff" that expresses changes as "these are copied from there, these are inserted anew", we can do a lot more interesting thing. We can track code movement across files, and that is not limited to renames. For example, suppose that the parent had files F1 and F2 and the child moved a function and copy-and-pasted a comment block from F1 to F2, and we are annotating lines in F2. The current git-blame sees that the function and comment block appeared from nowhere into F2 and blames the child for them. However, when annotating F2, we could: - use concatenation of all files in the parent that was modified between parent and child (or just "all files in the parent" -- the difference is exactly like plain -C vs -C --find-copies-harder) as the source image; - use lines of F2 in the child as the destination image; - run "ciff" algorithm to see where each line of F2 in the child came from (either copied from existing file somewhere in the parent, or inserted anew by the child). This would find that the function and the comment block were copied from F1 in the parent. An interesting property of this is that when the parent passes down the blame for the function the child moved in the above example further to its parent, we do not necessarily have to run "ciff" algorithm on file F1 as the whole. We only need to give the function (i.e. the lines the parent is still suspect for) [*2*]. So this makes destination image fed to "ciff" smaller as more lines are blamed on children while digging deeper, which may compensate for the need to feed not just that file but other files for copy detection on the source image side. [*1*] I think annotate follows this sketch http://thread.gmane.org/gmane.comp.version-control.git/14819/focus=14867 while blame follows this sketch http://thread.gmane.org/gmane.comp.version-control.git/5453/focus=5483 [*2*] we may need to use a handful surrounding context lines for better identification of copy source by the "ciff" algorithm but that is a minor implementation detail. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html