On 2019-01-23 at 11:26 Junio C Hamano <gitster@xxxxxxxxx> wrote: > Yeah, and if the original had two adjacent lines, and replacement > has three adjacent lines, the algorithm would not even know if > > - the first line in the original was split into first two in the > update and the second line was modified in place; or > > - the first line in the original was modified in place and the > second line was split into the latter two lines in the update > > In short, there is no answer to "what is the corresponding line of > this line before this commit changed it?" in general, and any > algorithm, as long as it tries to see what was the "corresponding > line" of the line that is blamed to a commit, would not produce > results human readers would expect all the time. > > As you said, heuristics may get us far enough to be useful, though > ;-). Yeah. We can do one more thing: when we ignore a change that added more lines than it removed, we can at least not report totally unrelated commits. For example, the parent of an ignored commit has this, say at line 11: commit-a 11) void new_func_1(void *x, void *y); commit-b 12) void new_func_2(void *x, void *y); commit-c 13) some_line_c commit-d 14) some_line_d After a commit 'X', we have: commit-X 11) void new_func_1(void *x, commit-X 12) void *y); commit-X 13) void new_func_2(void *x, commit-X 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d In my existing code, if you ignore commit X, the blames look like this: commit-a 11) void new_func_1(void *x, commit-b 12) void *y); commit-c 13) void new_func_2(void *x, commit-d 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d Lines 13 and 14 are blamed on the nearby commits C and D. The reason is the blame entry for X is four lines long, rooted at line 11, but when we look back through the history, we'll look at the parent's image of the file where that diff hunk is only two lines long. The extra two lines just blame whatever follows the diff hunk in the parent's image. In this case, it is just the lines C and D repeated again. I can detect this situation when we ignore the diffs from commit X. If X added more lines than it removed, then I only pass the number of lines to the parent that the parent had. The rest get their own blame_entry, marked 'unblamable', which I'll catch when we create the output. The parent can't find blame for lines that don't exist in its image of the file. With that change, the above example blames like this: commit-a 11) void new_func_1(void *x, commit-b 12) void *y); 00000000 13) void new_func_2(void *x, 00000000 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d As we discussed, we still can never be certain about which commits *should* be blamed for which lines. (Note that line 12 was blamed on B, though B was the commit for new_func_2(), not new_func_1()). But I can avoid blaming commits that just happen to be in the area and would only be 'correct' due to dumb luck. This also handles cases where an ignored commit only adds lines, which is a specific case of "added more than removed." Handling this case is a surprisingly small change. I'll post it once I sort out the tests and other comments on this patch series. For now, I went with unconditionally marking the commit as all 0s for the case where we know it is 'wrong.' I can do that conditionally based on blame.markIgnoredLines if you want, though I think any commit attributed to those lines would be misleading. Thanks, Barret