Re: [PATCH v2 2/3] blame: add the ability to ignore commits and their changes

Barret Rhoden <brho@xxxxxxxxxx> · Fri, 1 Feb 2019 15:37:02 -0500

On 2019-01-23 at 11:26 Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Yeah, and if the original had two adjacent lines, and replacement
> has three adjacent lines, the algorithm would not even know if 
> 
>  - the first line in the original was split into first two in the
>    update and the second line was modified in place; or
> 
>  - the first line in the original was modified in place and the
>    second line was split into the latter two lines in the update
> 
> In short, there is no answer to "what is the corresponding line of
> this line before this commit changed it?" in general, and any
> algorithm, as long as it tries to see what was the "corresponding
> line" of the line that is blamed to a commit, would not produce
> results human readers would expect all the time.
> 
> As you said, heuristics may get us far enough to be useful, though
> ;-).

Yeah.  We can do one more thing: when we ignore a change that added
more lines than it removed, we can at least not report totally
unrelated commits.  

For example, the parent of an ignored commit has this, say at line 11:

commit-a 11) void new_func_1(void *x, void *y);
commit-b 12) void new_func_2(void *x, void *y);
commit-c 13) some_line_c
commit-d 14) some_line_d

After a commit 'X', we have:

commit-X 11) void new_func_1(void *x,
commit-X 12)                 void *y);
commit-X 13) void new_func_2(void *x,
commit-X 14)                 void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d

In my existing code, if you ignore commit X, the blames look like this:

commit-a 11) void new_func_1(void *x,
commit-b 12)                 void *y);
commit-c 13) void new_func_2(void *x,
commit-d 14)                 void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d

Lines 13 and 14 are blamed on the nearby commits C and D.  The reason
is the blame entry for X is four lines long, rooted at line 11, but when
we look back through the history, we'll look at the parent's image of
the file where that diff hunk is only two lines long.  The extra two
lines just blame whatever follows the diff hunk in the parent's image.
In this case, it is just the lines C and D repeated again.

I can detect this situation when we ignore the diffs from commit X.  If
X added more lines than it removed, then I only pass the number of
lines to the parent that the parent had.  The rest get their own
blame_entry, marked 'unblamable', which I'll catch when we create the
output.  The parent can't find blame for lines that don't exist in its
image of the file.  

With that change, the above example blames like this:

commit-a 11) void new_func_1(void *x,
commit-b 12)                 void *y);
00000000 13) void new_func_2(void *x,
00000000 14)                 void *y);
commit-c 15) some_line_c
commit-d 16) some_line_d

As we discussed, we still can never be certain about which commits
*should* be blamed for which lines.  (Note that line 12 was blamed on
B, though B was the commit for new_func_2(), not new_func_1()).  But I
can avoid blaming commits that just happen to be in the area and would
only be 'correct' due to dumb luck.

This also handles cases where an ignored commit only adds lines,
which is a specific case of "added more than removed."

Handling this case is a surprisingly small change.  I'll post it once I
sort out the tests and other comments on this patch series.  

For now, I went with unconditionally marking the commit as all 0s for
the case where we know it is 'wrong.'  I can do that conditionally
based on blame.markIgnoredLines if you want, though I think any commit
attributed to those lines would be misleading.

Thanks,

Barret