Re: rebase - "fixup!" conflict applied at the wrong code location, why ?

Elijah Newren <newren@xxxxxxxxx> · Wed, 8 Jul 2020 22:07:07 -0600

On Wed, Jul 8, 2020 at 5:07 PM Philippe Blain
<levraiphilippeblain@xxxxxxxxx> wrote:
>
> Hello,
>
> I've been working on a branch for a while. I've been using `git commit --fixup`  and `git commit --squash`
> when I noticed that I had forgotten to add something to a previous commit.
> Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the
> first 'fixup!' commit with a conflict. However, the conflict is not located at the right place
> in the code (it's not in the right subroutine!). This is very surprising to me, and I would
> like to understand why it happens.
>
> Steps to reproduce:
>
> git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice
> cd cice
> git rebase -i --autosquash my-first-commit
> # save the todo list without modifications
>   Auto-merging <file>
>   CONFLICT (content): Merge conflict in <file>
>   error: could not apply e8bfa55... fixup! <commit message of f4e1ae6>
> # the rebase stops at f4e1ae6
> git diff
> # tangential question : for some reason the hunk header does not appear here, I don't know why...
> git diff -2  # but it appears here
> git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...'  # or here
> # ok, the conflict appears in subroutine 'picard_solver'
> git show REBASE_HEAD -U5
> # but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !!
>
> I would have expected that the conflict be created around lines 1118-1132
> (line numbers in f4e1ae6), in the 'anderson_solver' subroutine.
>
> I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops)
> is the commit where the 'anderson_solver' subroutine is added to the code...
>
> Thanks,
>
> Philippe.

If you take a look where the rebase stops, you see:

$ git ls-files -u
100644 ee4377f1ec6836fa05573976a473373906c37d9f 1
cicecore/cicedynB/dynamics/ice_dyn_vp.F90
100644 30c699ac371c2a751052fa98d04317e84a96ec47 2
cicecore/cicedynB/dynamics/ice_dyn_vp.F90
100644 276f224e9048fe0bbd7c25822695049547362c87 3
cicecore/cicedynB/dynamics/ice_dyn_vp.F90

The difference from the merge base to "other" (index 3) is pretty
tiny, you just moved one line in the "anderson_solver" subroutine
about 10 lines down.  The more interesting difference is from the
merge base to "ours" (index 2), seen with:

$ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
:2:cicecore/cicedynB/dynamics/ice_dyn_vp.F90

If you search for "anderson_solver" in that diff, you see that the
reconstructed diff looks like "subroutine anderson_solver" is renamed
to "subroutine picard_solver" with similar but slightly different
arguments.  The diff routine successfully finds lots of lines common
between these two subroutines so it seems quite logical as a way of
representing the changes, especially since they occur near each other
in the order of the file.  If you keep looking further in the diff, it
says that a few hundred lines later there's another subroutine rename,
this time from "subroutine calc_zeta_Pr" to "subroutine
anderson_solver".  Looking at it this way, it's quite natural to apply
the change from the rebased commit to the picard_solver since the
other side of history "renamed" anderson_solver to picard_solver.

This might not make as much sense to you, since you tend to think in
terms of parse trees and "subroutines" are a high level unit.  But
diffs work in terms of lines and have no knowledge of any kind of file
structure or semantics.  If you have a string of functions A,B,C and
insert a new function Z on one side of history somewhere in a file
that happens to look a lot like existing functions (meaning several
identical lines between the two), then diffs won't necessarily treat
it as a contiguous block of inserted lines but instead compare
function Z on one side to function B on the other, then function B on
the first side to function C on the second, etc.

There are alternative diff algorithms that try to minimize the number
of changed lines shown by the diff, such as --histogram and
--patience, and they do shrink this particular diff, but they both
yield the same treatment of comparing "subroutine anderson_solver" to
"subroutine picard_solver" essentially treating it as a rename.  To
get a merge like you want, you'd need some kind of higher level
semantic merge, or at least need functions to not have so many lines
in common between them.

This kind of problem also causes big issues when reordering functions
in a file.  line-by-line diffs and the diff3 merge algorithm tend to
really struggle with those.  (See e.g.
https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf if you want
to read up more on these.)

Hope that helps,
Elijah