Re: Rename edge case...

John Szakmeister <john@xxxxxxxxxxxxxxx> · Fri, 9 Nov 2012 21:01:03 -0500

On Fri, Nov 9, 2012 at 11:09 AM, Jeff King <peff@xxxxxxxx> wrote:
[snip]
> Right. If the source didn't go away, it would be a copy. We can do copy
> detection, but it is not quite as obvious what a merge should do with a
> copy (apply the change to the original? To the copy? In both places? You
> would really want hunk-level copy detection for it to make any sense).

Yeah, I wasn't advocating that.  More along the lines of what you're
talking about below...

> Usually git deals with this double-rename case through the use of
> "break" or "rewrite" detection. We notice that the old "foo.txt" and the
> new "foo.txt" do not look very much like each other, and break the
> modification apart into an add and a delete. That makes each side
> eligible for rename detection, and we can end up finding the pairs of
> renames above.

I did try using the -B option, and it did detect that foo.txt was
renamed to fooOld.txt, but it didn't show fooNew.txt being renamed to
foo.txt.  I'm running git 1.7.12.3.  It could be that 1.8.0 does
better, but I haven't tried.

> So in theory it just as simple as a one-liner to turn on break-detection
> in merge-recursive. Sadly, that only reveals more issues with how
> merge-recursive handles renames. See this thread, which has pointers to
> the breakages at the end:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/169944

Thank you.  I'll definitely read up on this.

> I've become convinced that the best way forward with merge-recursive is
> to scrap and rewrite it. It tries to do things in a muddled order, which
> makes it very brittle to changes like this. I think it needs to have an
> internal representation of the tree that can represent all of the
> conflicts, and then follow a few simple phases:
>
>   1. "structural" 3-way merge handling renames, breaks, typechanges,
>      etc. Each path in tree might show things like D/F conflicts, or it
>      might show content-level merges that still need to happen, even if
>      the content from those merges is not coming from the same paths in
>      the source trees.
>
>   2. Resolve content-level 3-way merges at each path.
>
>   3. Compare the proposed tree to the working tree and list any problems
>      (e.g., untracked files or local modifications that will be
>      overwritten).
>
> Right now it tries to do these things interleaved as it processes paths,
> and as a result we've had many bugs (e.g., the content-level merge
> conflating the content originally at a path and something that was
> renamed into place, and missing corner cases where we actually overwrite
> untracked files that should be considered precious).
>
> But that is just off the top of my head. I haven't looked at the topic
> in quite a while (and I haven't even started working on any such
> rewrite).

That certainly sounds like a better approach.

>> So I played locally with a few ideas, and was surprised to find out
>> that even breaking up the two renames into two separate commits git
>> still didn't follow it.
>
> Right, because the merge only looks at the end points. Try doing a
> "diff -M" between your endpoints with and without "-B". We do not have
> any double-renames in git.git, but you can find "-B" helping a similar
> case: most of a file's content is moved elsewhere, but some small amount
> remains. For example, try this in git.git, with and without -B:
>
>   git show -M --stat --summary --patch 043a449
>
> It finds the rename only with "-B", which would help a merge (it also
> makes the diff shorter and more readable, as you can see what was
> changed as the content migrated to the new file).

I've played with the -B option before, and it's definitely nice in
certain cases.

Thank you for taking the time to write all this up.  It was very informative!

-John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html