Fwd: Opinions on changing add/add conflict resolution?

Elijah Newren <newren@xxxxxxxxx> · Mon, 12 Mar 2018 19:53:10 -0700

[Re-sending because this computer happened to have plain-text mode
turned off for some weird reason, and thus the email bounced]

Hi,

On Mon, Mar 12, 2018 at 3:19 PM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> Does this mean that e.g. in this case of merging two files, one
> containing "foo" and one containing "bar":
>
>     (
>         rm -rf /tmp/test.git &&
>         git init /tmp/test.git &&
>         cd /tmp/test.git &&
>         echo foo >README &&
>         git add README &&
>         git commit -mfoo &&
>         git checkout --orphan trunk &&
>         git reset --hard &&
>         echo bar >README &&
>         git add README &&
>         git commit -mbar &&
>         git merge --allow-unrelated-histories master;
>         cat README
>     )
>
> That instead of getting:
>
>     <<<<<<< HEAD
>     bar
>     =======
>     foo
>     >>>>>>> master
>
> I'd now get these split into different files?

As currently implemented, yes.  However, I was more concerned the idea
of handling files differently based on whether or not they were
similar, rather than on what the precise definition of "similar" is
for this context.

As far as the definition of similarity goes, estimate_similarity() is
currently used by rename detection to compare files recorded at
different pathnames.  By contrast, in this context, we are comparing
two files which were recorded with the same pathname.  That suggests
the heuristic could be a bit different and use more than just
estimate_similarity().  (e.g. "We consider these files similar IF more
than 50% of the lines match OR both files are less than 2K.")

> I don't mind this being a configurable option if you want it, but I
> don't think it should be on by default, reasons:

I don't think a configurable option makes sense, at least not for my
purposes.  Having rename/rename conflicts be "safely" mis-detected as
rename/add or add/add, and having rename/add conflicts be "safely"
mis-detected as add/add is my overriding concern.  Thus, making these
three conflict types behave consistently is what I need.  Options
would make that more difficult for me, and would thus feel like a step
backwards.

git am/rebase has been doing such mis-detections for years (almost
since the "dawn" of git time), but it feels really broken to me
because the conflict types aren't handled consistently.  (The facts
that (a) I'm the only one that has added rename/add testcases to
git.git, (b) that I've added all but one of the rename/rename(2to1)
testcases to the git.git testsuite, and (c) that rename/add has had
multiple bugs for many years, all combine to suggest to me that folks
just don't hit those conflict types in practice and thus that they
just aren't noticing this breakage -- yet.)

I also want to allow such mis-detections for cherry-picks and merges
because of the significant (nearly order-of-magnitude in some cases)
performance improvements I can get in rename detection if it's
allowed.

>  1) There's lots of cases where we totally screw up the "is this
>     similar?" check, in particular with small files.
>
>     E.g. let's say you have a config file like 'fs-path "/tmp/git"' and
>     in two branches you change that to 'fs-path "/opt/git"' and 'fs-path
>     "/var/git"'. The rename detection will think this these have nothing
>     to do with each other since they share no common lines, but to a
>     human reader they're really similar, and would make sense in the
>     context of resolving a bigger merge where /{opt,var}/git changes are
>     conflicting.
>
>     This is not some theoretical concern, there's lots of things that
>     e.g. use small 5-10 line config files to configure some app that
>     because of some combo of indentation changes and changing a couple
>     of lines will make git's rename detection totally give up, but to a
>     human reader they're 95% the same.

Fair enough.  The small files case could potentially be handled by
just changing the similarity metric for these conflict types, as noted
above.  If it's a small file, that might be the easiest way for a user
to deal with it too.

I'm not sure I see the problem with the bigger files, though.  If you
have bigger files with less than 50% of the lines matching, then
you'll essentially end up with a great big conflict block with one
file on one side and the other file on the other side, which doesn't
seem that different to me than having them be in two separate files.
In fact, separate files seems easier to deal with because then the
user can run e.g. 'git diff --no-index --color-words FILE1 FILE2',
something that they can't do when it's in one file.  That has bothered
me more than once, and made me wish they were just in separate files.

>  2) This will play havoc with already established merge tools on top of
>     git which a lot of users use instead of manually resolving these in
>     vi or whatever.
>
>     If we made this the default they'd need to to deal with this new
>     state, and even if it's not the default we'll have some confused
>     users wondering why Emacs Ediff or whatever isn't showing the right
>     thing because it isn't supporting this yet.
>
> So actually, given that last point in #2 I'm slightly negative on the
> whole thing, but maybe splitting it into some new format existing tools
> don't understand is compelling enough to justify the downstream breakage.

To me, this is a bigger concern.  We have changed conflict resolutions
in various ways at various times over the years, so I don't think the
output should be considered fixed in stone, but I am very sympathetic
to arguments that this particular change is too painful.

There is a possible way to allow for a transition period, though.
Junio has already requested that some of the changes I'm working on be
done as a new merge strategy that is a rewrite of merge-recursive
(https://public-inbox.org/git/xmqqk1ydkbx0.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/).
It would be a little unfortunate to not be able to use the new merge
strategy as an exact drop-in replacement of the current recursive
merge strategy, but it would provide a way for people to get some time
to migrate other tools.

However, I'm far more concerned with the three collision conflict
types having consistent behavior than I am with changing add/add
conflict handling.  And if your two concerns or Jonathan's concern
would prevent you from wanting to use the new merge strategy (and
possibly prevent it from becoming the default in the future), I'd much
rather just modify rename/add and rename/rename to behave like
add/add.  Would that be more to your liking?

Elijah