Re: Opinions on changing add/add conflict resolution?

Elijah Newren <newren@xxxxxxxxx> · Mon, 12 Mar 2018 14:26:02 -0700

Hi,

Cool, thanks for taking a look!

On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
>
> My immediate reaction is that it seems inconsistent with the rest of
> merge behavior.  Why would add/add behave this way but edit/edit not
> behave this way?

Fair enough.  I have two separate reasons for believing that edit/edit
IS different than add/add; further, the current behavior for add/add
is inconsistent with the rest of merge behavior on a different axis.
I think it's helpful to get the motivation for my changes before
trying to explain why those are different and before delving into the
other inconsistency, so I'll add the explanation of this claim at the
end of this email.

> Would this behavior be configurable or unconditional?  I suspect I
> would want it turned off in my own use.
>
> On the other hand, in the case of wild difference between the two
> files, skipping the two-way merge and just writing one of the versions
> to the worktree (like we do for binary files) sounds like something I
> would like in my own use.

I think you just said the exact opposite thing in these last two
paragraphs; that you wouldn't want my proposed behavior and that you'd
want it.  I suspect that may mean that I misunderstood something you
said here.  Could you clarify?

> Can you add something more about the motivation to the commit message?
> E.g. is this about performance, interaction with some tools, to
> support some particular workflow, etc?

To be honest, I'm a little unsure how without even more excessive and
repetitive wording across commits.  Let me attempt here, and maybe you
can suggest how to change my commit messages?

  * When files are wildly dissimilar -- as you mentioned -- it'd be
easier for users to resolve conflicts if we wrote files out to
separate paths instead of two-way merging them.
  * There is a weird inconsistency between handling of add/add,
rename/add, and rename/rename(2to1).  I want this inconsistency fixed.
  * There is a significant performance optimization possible for
rename detection in git merge if we remove the above inconsistency and
treat these conflict types the same.  (Actually, we only need them to
be the same in the special case where a renamed file is unmodified on
the un-renamed side of history, but I don't want to special case that
situation because it sounds like a recipe for inconsistent results).
  * If we insist on these conflict types being handled differently
because there really is some important distinction between them, then
I'm almost certain that build_fake_ancestor() and it's usage in
am/rebase is broken/wrong.  This is because the usage of
build_fake_ancestor(), as currently implemented, will cause
mis-detection of one conflict type as another.

> Thanks and hope that helps,
> Jonathan

As promised above, here are my reasons for believing that edit/edit IS
fundamentally different than add/add for the behavior considered here,
as well as my explanation of the weird inconsistency add/add currently
has with the rest of merge behavior:

==Reason 1==
edit/edit means that ${path} existed in the merge base as well as both
sides.  It is more likely than not that ${path} on each side is
similar to the merge base and thus similar (though less so) to each
other.  On the other hand, add/add means ${path} did NOT exist in the
merge base.  Thus, we cannot use the same reason to believe they are
similar.  The only reason we have to assume similarity is the
filename, which, while a useful indicator, gives us a weird
inconsistency within git for handling pathname collisions -- see
"Weird inconsistency" below.

==Reason 2==
In merges, git does rename detection, but not copy or break detection.
In particular, break detection is what would allow us to find out that
the edited files are not similar to the original.  add/add conflicts
automatically come to us "pre-broken" (is there a better term for
knowing that two files are not paired?), though it is possible that
they just happen to be similar and should be paired/joined.

==Follow on to reason 2==
Not doing break detection means we get e.g. rename/add-source cases
wrong, so there are valid arguments for saying we should add break
detection.  However, it would be computationally expensive and would
require a fair amount of work re-thinking all the cases in
merge-recursive to make sure we get rename-breaks right (there are
FIXMEs and comments and testcases I added years ago to help someone
along the path if they want to try).  Since rename/add-source just
isn't a conflict type that occurs much (if it all) in real world
repositories, the motivation to implement it is somewhat low.

==Weird inconsistency git currently exhibits==
There are three types of conflicts representing two (possibly
unrelated) files colliding at the same path: add/add, rename/add, and
rename/rename(2to1).  add/add does the two-way merge of the colliding
files, and the other two conflict types write the two colliding files
out to separate paths.  I think the motivation behind the current
behavior was that add/add was written assuming filename-match implies
similarity, and the other two were written assuming filename-match
didn't imply anything and the files should be assumed dissimilar.
That seems weird to me.  I think all three should behave the same
(especially when the renamed file was unmodified on the un-renamed
side of history, and likely whenever there is no content conflicts for
the renamed file).