Re: git-blame not tracking copies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andy Parkins <andyparkins@xxxxxxxxx> writes:

> The issues are
>
>  - Blame2 says all the lines come from commit 4, when actually they
>    come from commits 1, 2 and 3.  It was pointed out that this is
>    particularly annoying because the file is an exact copy and so the
>    copy has the same hash as the original so should be easy to spot
>
>  - The output isn't stable, even if blame2 had a good reason for not
>    assigning lines 1 and 2 to their correct commits, why isn't the same
>    true in blame3?
>
>  - Blame3 incorrectly ascribes line 4 to commit 4, when it should have
>    remained as it was in blame1 - to commit 3.

This turns out to be a simple and stupid boundary error.

The algorithm passes the blame by (ab)using diff to find the
common section.  We run an equivalent of "diff -u0", notice the
lines that come out of it -- the lines that do not appear are
common ones.  Simply put, if we have this hunk at the beginning:

	@@ -4,1 +4,2 @@
        -a
        +A
        +B

the only information we are interested in this hunk are (1) that
the hunk begins at line 4 in the postimage, so lines 1,2,3 in
the postimage are the same as the preimage; and (2) that the
different part ends just before line 6 (the hunk has two lines
in the postimage which we know do not match the preimage).  So
if the next hunk from the diff look like this...

	@@ -6,1 +8,1 @@
        -e
        +E

then we know lines 6 and 7 in the postimage are the same as the
preimage.

We need to use the information (2) from the last hunk in the
patch and blame all the remaining lines to the parent.

When we are assigning blame from one file in the "current"
commit down to one (possibly different) file in its parent
commit in pass_blame_to_parent() function, we grab patch between
the two, and we have the final "the rest are the same" call to
blame_chunk().  However, the code to assign blame for only part
of a file to unrelated file in its parent (i.e. -M/-C), the
corresponding code in find_copy_in_blob() to use the diff output
was missing this "the rest are the same" handling.

A 3-patch series follows.



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]