Re: [WIP PATCH 0/3] implement merge strategy for submodule links

Johan Herland <johan@xxxxxxxxxxx> · Tue, 15 Jun 2010 01:59:42 +0200

On Monday 14 June 2010, Heiko Voigt wrote:
> On Sun, Jun 13, 2010 at 07:59:43PM +0200, Johan Herland wrote:
> > On Saturday 12 June 2010, Heiko Voigt wrote:
> > A better solution would be, to put it generally: Given a submodule
> > being part of a superproject conflict, if one of the candidate
> > submodule SHA1s is is a descendant of _all_ the other submodule SHA1
> > candidates, then choose that SHA1 as the proposed resolution (but
> > please leave the index entry "unmerged", so that the resolution must
> > be confirmed by the user).
> 
> Is there currently any logic to support a "suggested" merge resolution
> in git.git? I am not that familiar with code base yet and I do not think
> that I have seen something like that. Is it done somewhere already?

In the case of a regular file, you can replace the default working tree 
version (the one with conflict markers) by a version containing your 
suggested merge resolution, but NOT add that version to the index, so the 
index still thinks the file is unmerged. You then 'git add' the file to 
acknowledge the suggested resolution.

In the case of submodules, you would check out the suggested version of the 
submodule, but not add its SHA1 to the superproject index. Again, the user 
acknowledges the suggested resolution by 'git add'ing the submodule.

My point is that when Git tries to suggest merge resolutions, it should 
purposefully NOT add these to the index, so that the user HAS to acknowledge 
them. This is similar to the default behaviour of 'git rerere' which 
resolves your conflicts automatically, but does not touch the corresponding 
"unmerged" index entries, so that you manually have to 'git add' the result.

> > This removes all the "stable" branch magic from your patch. All you
> > need to look at are the candidate SHA1s and their relationship in the
> > commit graph. No refs involved.
> > 
> > In the A->B vs. A->C case above, we would see that C is a descendant of
> > B, and we would therefore choose C as a suggested conflict resolution,
> > which IMHO is a much better choice than D.
> > 
> > I still don't want to add a lot of auto-resolving cleverness to Git, as
> > it inevitably _will_ choose incorrectly sometimes, and in those
> > situations it will be much more confusing than if it didn't choose at
> > all.
> 
> I see your point. But nevertheless there is a specific workflow I target
> to support which is not supported by your approach:
> 
> Lets assume Alice creates a feature branch feature_a for her development
> and needs to modify the submodule and creates a branch there as well. At
> the same time Bob develops feature_b and also needs changes in the
> submodule and so he creates a feature branch there as well.
> 
> Assume we now have the following history in the submodule:
> 
>   B---C---D         [feature_a]
>  /         \
> A---E---F---G---K   [master]
>      \         /
>       H---I---J     [feature_b]
> 
> Now during the development of her branch Alice would link D in the
> superproject as it is the tip of her branch. Bob would do the same and
> link to J as his tip. Now Alice sends out her branch to the reviewers
> and after everybody is happy with it the maintainer merges her branch
> first. The superproject links to D.

No. The superproject would get a conflict between the A->D and A->F updates 
of the submodule. The correct resolution would be to go into the submodule, 
do the merge to produce G, and then record this as the correct merge 
resolution in the superproject.

You want Git to do this automatically for you, whereas I think that Git 
should not be that "clever", because there are situations (as I've 
demonstrated previously in this thread) where the "cleverness" would do The 
Wrong Thing.

> Now Bob does the same and the
> maintainer wants to merge his branch and gets a merge conflict because D
> and J do not have a parent/children relationship.

Well, s/D/G/, but your point still stands. And the correct resolution is, of 
course, to merge G and J to produce K, and then record K in the superproject 
as the correct merge resolution.

Again, the question is whether Git should do these submodule merges 
automatically, or not.

> I think this is a fairly natural pattern which evolves from the use of
> feature branches in git. So I would like to make git behave naturally
> for this workflow and automatically merge.

Please keep in mind that your workflow is but one, and Git has to support a 
wide variety of different workflows without breaking down.

It actually seems to me that - in your workflow scenario, where the 
submodule seems to be fairly tightly coupled to the superproject - you would 
be better off using branch-tracking submodules (recently discussed in the 
thread called 'RFC: Making submodules "track" branches').

When using branch-tracking submodules, Alice would configure the submodule 
to track the "feature_a" branch, while Bob would configure the submodule to 
track the "feature_b" branch. When merging these branches, the correct merge 
resolution would be (after having merged the submodule feature branches back 
into the submodule "master") to track the "master" branch in the submodule.

When merging the two feature branches back into "master" there would (in 
addition to the conflicted submodule entry) be conflicts in the .gitmodules 
file on which submodule branch to track (I'm following Ævar's proposal 
here), and the resolution of _that_ conflict would specify which submodule 
branch/version to use in the resolved merge.

Now, in your proposal, you would have an _additional_ config variable for 
controlling which submodule branch is equivalent to "master" in the above 
example. If this branch were to be different from the "tracked" branch (as 
defined in Ævar's proposal), you would be in the deeply confusing situation 
of having the merge resolved to the tip of one branch, while you've told the 
submodule to track a _different_ branch. The only thing that makes sense 
would be for these two variables to always be identical, at which point you 
should simply eliminate one of them.

I guess what I'm getting at (sorry for taking a while to get here) is that 
you could maybe solve your problem by a combination of what I suggested in 
my previous mail, plus the use of branch-tracking submodules. There are 
still some things to be worked out here, but I don't believe adding an 
almost-but-not-quite-submodule-branch-tracking option is the best way to go.

> Now your point is that master could be wrong and you are right, but
> normal merges can go wrong in a similar way. Just imagine this:
> 
> Alice adds a parameter to the static function somefunc() and changes all
> callsites of it in her branch. Independently Bob writes new code in
> his branch that uses somefunc() with the old signature. When both
> branches are merged git has no chance of doing it right and the code
> will not compile. So even normal merging is always a little heuristic.
> Question is: How well does the heuristic perform in practise.

True, but I don't necessarily accept that one sometimes-wrong heuristic 
justifies another sometimes-wrong heuristic. Follow that logic, and we can 
pile on heuristics until we almost always get something wrong...

> > Ok, so you can configure exactly which branch(es) you consider stable.
> > I'd still much rather prefer the approach I outlined above, which does
> > away with all the "stable" branch magic, and only considers the commit
> > ancestry directly.
> 
> Ok what do you think about combining both approaches: If no stable
> branches are configured we default to your strategy and if the user
> wants some magic (I mean isn't that what git is all about: magic)
> configuring stable branches will enable git to resolve conflicts like
> the ones I described above.

FWIW, IMHO Git is NOT about magic at all. It even says so at the top of the 
git(1) manual page: "git - the stupid content tracker". And both Junio and 
Linus have repeatedly argued how Git purposefully only auto-resolves the 
_simple_ cases, and leaves the _hard_ cases to the user, since trying to be 
clever about the hard cases inevitably leads to more confusion and insanity.

In this case, your scenario/proposal just about crosses into what I consider 
the _hard_ space, which is why I'm critical of the cleverness.

As for combining both approaches (subject to some config option), I guess 
that could work, but I'd certainly like to see a significant amount of 
support for your proposal before we go there.

> My feeling is that in practise automatic merging into stable branches
> will work well and the cases of failure will be neglectable to not
> happening at all. So my approach would be to go ahead, implement the
> strategy and let people play around with it so we can collect some real
> life data whether it is helping or making matters worse.

Feel free to post the patches, if you can spend the time making them. So 
far, there's been no other feedback in this thread, so maybe I'm alone in my 
worries...

Cheers,

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html