Re: problem with git detecting proper renames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Nov 29, 2007, at 11:44 AM, Linus Torvalds wrote:

On Thu, 29 Nov 2007, Kumar Gala wrote:

I did some git-mv and got the following:

the problem is git seems confused about what file was associated with its
source.

Well, I wouldn't say "confused". It found multiple identical options for the source, and picked the first one (where "first one" may not be obvious
to a human, it can depend on an internal hash order).

But if you have the resultant git tree somewhere public (or just send me the exact "git mv" and revision to recreate), I'll happily give it a look, to see if we can improve our heuristics to be closer to what a human would
expect.

For example, in this case, it looks like there were two totally identical "init.S" files that got renamed with the same identical content to two new names. YOU seem to expect that it would stay as two renames, but from a
content angle, since the two sources were identical, it's a totally
arbitrary choice whether it's a "copy one source to two destinations and delete the other source" or whether it's two cases of "move one source to another destination" (and the latter case also has the issue of which way
to move it).

(You also had two identical Makefile's with the exact same issue).

So git doesn't care about how you did the rename, it only cares about the
end result, and the exact same way that it will detect a rename if you
implement it as a "copy file" and then a later "delete old file", it will also potentially go the other way, or just decide that identical contents
moved in different ways.

I was guessing most of this but wanted to make sure there wasn't some cool feature of git I wasn't aware of.

But we can certainly tweak the heuristics. For example, if we find
multiple identical renames, right now we just pick one fairly at random, and have no logic to prefer independent renames over "multiple copies and
a delete". But this code is actually fairly simple, and with a good
example I can easily add heurstics (for example, it probably *is* better to consider it to be two renames, just because the resulting diff will be
smaller - since a "delete" diff is much larger than a rename diff).

In the case of multiple identical matches can we look at the file name as a possible heuristic?

- k
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux