Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 25 Jul 2017 13:26:34 -0700

Jeff King <peff@xxxxxxxx> writes:

>> 1) Is this a known limitation or is there a reason rerere works in
>> this manner?
>
> Yes, it's known. Rerere works by storing a mapping of conflicted hunks
> to their resolutions. If there's no conflicted hunk, I'm not sure how
> we'd decide what to feed into the mapping to see if there is some
> content to be replaced.

Correct.  

This is not even a limitation but is outside the scope of rerere.
Let's understand that first.

A semantic conflict that requires an evil merge that touches a file
that is not involved in any textual conflict during a merge will
happen even if there is *NO* textual merge conflict.  

Imagine that there is a global variable 'xyzzy' used in many places
in the code, and then a side branch forks from the mainline.  The
mainline renames the variable to 'frotz' in the entire codebase,
while the side branch adds one more place that the variable is used
under its original name.  Then you merge these two branches.  This
will textually merge cleanly if the place the side branch adds a new
mention of 'xyzzy' is textually far from any block of text in the
common ancestor that has been updated on the mainline while these
two branches diverged.

"git checkout mainline && git merge side" will cleanly automerge,
yet the result is not correct.  The new mention of 'xyzzy' added by
the merge needs to be corrected to 'frotz'.

Now we take that as the baseline and further imagine that during the
time these two branches diverged, the mainline also updated some
documentation of something totally unrelated to 'xyzzy' vs 'frotz'
variable.  Perhaps README was updated, or something.  The side
branch also updated the same file in a different way.  This time,
the changes to this same file may result in textual conflict.

"git checkout mainline && git merge side" will result in a conflict,
whose resolution may be recorded by rerere for that file.  It should
be crystal clear that this conflict does *not* have anything to do
with the semantic conflict between 'xyzzy' vs 'frotz'.  

The realization we must draw from the above observation is that what
the "merge-fix" machinery in the Reintegrate script you cited in
your message tries to help, which is the semantic conflict,
fundamentally cannot be tied to any textual merge conflict that may
(or may not) happen.  That is what makes the issue outside the scope
of rerere.

The above is not to say that the need to record and replay such evil
merges to solve semantic conflict does not exist.  Far from it.  It
is just clarifying that it is a wrong approach to try to "teach"
rerere to somehow handle that case as well.

If we wanted to port the "merge-fix" logic, and I do wish it would
happen some day, the update belongs to "git merge".

You were too kind to call the "merge-fix" logic in Reintegrate "the
state of the art", but I am not happy about its limitation.  Here
are the things I wish to have in an ideal version of the "merge-fix"
logic, which does not exist yet:

 * There is a database of "to be cherry-picked" commits, keyed by a
   pair of branch names.  That is, given branches A and B, the
   database will return 0 or more commits that can be cherry-picked.
   The order of branch names in the pair is immaterial, i.e. asking
   the database for cherry-pickable commits keyed by <A, B> and
   keyed by <B, A> will yield the identical set of commits.

 * When merging commit X to commit Y, "git merge" in the ideal world
   does the following:

   - It first does what it currently does, i.e. three-way merge with
     the merge strategy and applying rerere for re-application of an
     earlier resolution of textual conflicts.

   - Then, it looks up the database to find the keys <A, B> where
     A is in X but not in Y, and B is not in X but in Y.
     These commits are cherry-picked and squashed into the result of
     the above.

The intent is that a pair <A, B> represents the mainline and side
branch in the above example, where A renamed 'xyzzy' to 'frotz' and
B added new reference to 'xyzzy'.  And the cherry-pickable commit
found in the database is to tweak the 'xyzzy' B adds into 'frotz'.

I said A and B in the above are branch names, but in the ideal
world, they can be commit object names (possibly in the middle of a
branch), as long as we can reliable update the database's keys every
time "git rebase" etc. rewrites commits.

To populate the database, we'd need a reverse.

 * When merging branch B into branch A (or the other way around) for
   the first time, "git merge" would do what it currently does.

 * The user then concludes the merge to resolve *ONLY* the textual
   conflict, and commit the result.  It is important that no
   additional evil merge to correct for semantic conflicts is done
   in this step.  Note that if the auto-merge cleanly succeeds, this
   step may not even exist.

 * Then the user makes another commit to correct for semantic
   conflicts (aka "merge-fix").

 * Then the user tells Git that semantic conflicts were resolved and
   need to be recorded (just like running "git rerere" manually,
   before "git commit" automatically does it for them these days).
   This will result in the following:

   - The database is updated so that key <A, B> yields the
     "merge-fix" commit;

   - The head is detached at the tip of branch A before the merge;

   - "git merge B" is done again, which _should_ reproduce the state
     immediately after the user committed the "merge-fix";

   - The tip of branch A is reset to the result of the above.

The merge-fix logic in Reintegrate is a poor-man's emulation of the
above ideal.  A value its database yields is not a set of commits,
but a single commit, and instead of getting keyed by a pair of
branch names, the database is keyed by a single branch name
(i.e. recording "I had trouble when merging this branch" without
saying "... to the integration branch that already had this other
branch"), so the look-up does not have to do "A is in X but not in
Y, and B is not in X but in Y".  

It is still usable but the database need to be reorganized every
time the order of topics merged to 'pu' is changed.