Re: Merging split files

Jeff King <peff@xxxxxxxx> · Tue, 29 Mar 2011 11:16:23 -0400

On Fri, Mar 18, 2011 at 09:22:36AM -0400, Stephen Bash wrote:

> In our previous release foo.cxx contained both the base class and a
> few subclasses.  Since then the number of subclasses has grown, and
> we've split foo.cxx (base and sub-classes) into foo-base.cxx (base
> class) and foo-defs.cxx (sub-classes).  Since the release, we've had a
> few bug fixes in foo.cxx on the maintenance branch, and need to merge
> those back to development.  When I did the merge Git identified
> foo.cxx as moved to foo-defs.cxx, which worked for most changes, but a
> few needed to be in foo-base.cxx.  In this case it was a pretty
> trivial manual resolution, but is there a method for handling merges
> of split files?

I don't think there is currently a good way to do this automatically.

The problem is that the closest merge-recursive gets to understanding
content movement is that it considers whole file renames. So it sees
"foo.cxx became foo-defs.cxx", and applies changes to foo.cxx to
foo-defs.cxx, but it has no clue that foo-base.cxx. So at the very
least, it would need to represent "foo.cxx has split into foo-base.cxx
and foo-defs.cxx", which is not something it can currently handle. But
more than that, you want to know _which_ parts moved to each file.

So I think the most flexible thing is to forget file renames at all.
They are just a rough version of the general idea of content movement.
In theory, we should be able to see that the content we changed in
foo.cxx no longer exists, and then start looking for similar content
elsewhere. Not similar _files_, but for the chunk of content that is
changed between the merge base and the maintenance (and some surrounding
context), find where that bit of content went. And then try to merge our
changes into that new bit of content.

One problem is that when it fails, it fails pretty hard. With file
renames, your changes at least usually ends up in the right file (your
present problem excluded), and you get some textual mess to clean up.
But with content-level renaming, I suspect in conflict cases we would
end up with no clue where the result goes (because the conflict means we
can't easily match up the content for similarity), and have to stick it
in the deleted file. On the other hand, it might simply work to keep
expanding the amount of context we consider for content similarity until
we find a match, which eventually would end up considering the whole
file, and generalize to a file rename.

Implementing that inside of merge-recursive is likely to be pretty nasty
(even the current file-rename code is already pretty nasty). But it may
be possible to prototype something that runs after we hit the conflicted
state, like mergetool.

I definitely think it's an interesting area to work in, but I would have
to give it a lot of thought.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html