Re: Opinions on changing add/add conflict resolution?

Elijah Newren <newren@xxxxxxxxx> · Mon, 12 Mar 2018 17:38:58 -0700

I'm worried that my attempt to extract add/add from the rest of the
discussion with rename/add and rename/rename resulted in a false sense
of simplification.  Trying to handle all the edge and corner cases and
remain consistent sometimes gets a little hairy.  Anyway...

On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Elijah Newren wrote:
>> On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
>

> Sorry for the lack of clarity.  My understanding was that the proposed
> behavior was to write two files:
>
>         ${path}~HEAD
>         ${path}~MERGE

(Just to be clear: The proposed behavior was to do that only when the
colliding files were dissimilar, and otherwise two-way merging.)

> My proposal is instead to write one file:
>
>         ${path}
>
> with the content that would have gone to ${path}~HEAD.  This is what
> already happens when trying to merge binary files.

Thanks for clarifying.

I feel the analogy to merging binary files breaks down here in more
than one way:

1)

Merging binary files is more complex than you suggest here.  In
particular, when creating a virtual merge base, the resolution chosen
is not the version of the file from HEAD, but the version of the file
from the merge base.  Nasty problems would result otherwise for the
recursive case.

If we tried to match how merging binary files behaved, we run into the
problem that the colliding file conflict case has no common version of
the file from a merge base.  So the same strategy just doesn't apply.
The closest would be outright deleting both versions of the file for
the virtual merge base and punting to the outer merge to deal with it.
That might be okay, but seems really odd to me.  I feel like it'd
unnecessarily increase the number of conflicts users will see, though
maybe it wouldn't be horrible if this was only done when the files
were considered dissimilar anyway.

2)

merging binary files only has 3 versions of the file to store at a
single $path in the index, which fit naturally in stages 1-3;
rename/add and rename/rename(2to1) have 4 and 6, respectively.  Having
three versions of a file from a 3-way merge makes fairly intuitive
sense.  Have 4 or 6 versions of a file at a path from a 3-way merge is
inherently more complex.  I think that just using the version from
HEAD risks users resolving things incorrectly much more likely than in
the case of a binary merge.

> [...]
>>> Can you add something more about the motivation to the commit message?
>>> E.g. is this about performance, interaction with some tools, to
>>> support some particular workflow, etc?
>>
>> To be honest, I'm a little unsure how without even more excessive and
>> repetitive wording across commits.
>
> Simplest way IMHO is to just put the rationale in patch 5/5. :)  In
> other words, explain the rationale for the end-user facing change in the
> same patch that changes the end-user facing behavior.

Patches 3, 4, and 5 all change end-user facing behavior -- one patch
per conflict type to make the three types behave the same (the first
two patches add more testcases, and write a common function all three
types can use).  The rationale for the sets of changes is largely the
same, and is somewhat lengthy.  Should I repeat the rationale in full
in all three places?

>>                                     Let me attempt here, and maybe you
>> can suggest how to change my commit messages?
>>
>>   * When files are wildly dissimilar -- as you mentioned -- it'd be
>> easier for users to resolve conflicts if we wrote files out to
>> separate paths instead of two-way merging them.
>
> Today what we do (in both the wildly-dissimilar case and the
> less-dissimilar case) is write one proposed resolution to the worktree
> and put the competing versions in the index.  Tools like "git
> mergetool" are then able to pull the competing versions out of the
> index to allow showing them at the same time.
>
> My bias is that I've used VCSes before that wrote multiple competing
> files to the worktree and I have been happier with my experience
> resolving conflicts in git.  E.g. at any step I can run a build to try
> out the current proposed resolution, and there's less of a chance of
> accidentally commiting a ~HEAD file.
>
> [...]
>> There are three types of conflicts representing two (possibly
>> unrelated) files colliding at the same path: add/add, rename/add, and
>> rename/rename(2to1).  add/add does the two-way merge of the colliding
>> files, and the other two conflict types write the two colliding files
>> out to separate paths.
>
> Interesting.  I would be tempted to resolve this inconsistency the
> other way: by doing a half-hearted two-way merge (e.g. by picking one
> of the two versions of the colliding file) and marking the path as
> conflicted in the index.  That way it's more similar to edit/edit,
> too.

Your proposal is underspecified; there are more complicated cases
where your wording could have different meanings.

I also had a backup plan was to propose making rename/add and
rename/rename(2to1) behave more like add/add (instead of making the
behavior for all three a blend of their current behaviors), but the
problem is I have two backup proposals to choose between.  And it
sounds like you're adding a third, which follows your ~HEAD strategy
from above.  Let me see if I can explain why your proposal is
underspecified:

For rename/rename(2to1) we have up to 3 sets of conflicts.  Let's say
we renamed A->C in HEAD and B->C on the other side.  We first need to
do a three-way content merge on the first C (with the other two
versions of A), then we need to do a three-way content merge on the
second C (with the other two versions of B).  This gives us two
different C's, both with possible conflict markers, that want to be
stored at the same pathname.  Let me call these two versions of C, C1
and C2.  C1 and C2 may or may not be similar.  Resolving the fact that
both want to be stored at C is what gives us potentially a third set
of conflicts.

What's your resolution strategy here, for each combination of cases
(has conflict markers or not; is similar to the other C or not;
working on a virtual merge base or not)?

For reference there are two possible backup strategies I could propose here:

Backup Proposal #1: act stricly like add/add:
Do a two-way merge of C1 and C2.  If we end up with nested conflict
markers, oh well.  Good luck, user.

Backup Proposal #2: act like add/add, but only if C1 and C2 don't have
conflict markers.
If C1 or C2 have conflict markers, behave as currently -- record C1
and C2 to different paths.
If C1 and C2 do not have conflict markers, two-way merge them and
record in the worktree at C.

My question for your counter-proposal:
Do you record C1 or C2 as C?  Or do you record the version of C from
HEAD as C?  And what do you pick when creating a virtual merge base?

Problems I see regardless of the choice in your counter-proposal:
  * If you choose C from HEAD, you are ignoring 3 other versions of
"C" (as opposed to just one other version when merging a binary file);
will the user recognize that and know how to look at all four
versions?
  * If you choose C1 or C2, the user will see conflict markers; will
the user recognize that this is only a _subset_ of the conflicts, and
that there's a lot of content that was completely ignored?
  * There is no "merge base of C1 and C2" to use for the virtual merge
base.  What will you use?