Re: rebase - "fixup!" conflict applied at the wrong code location, why ?

Philippe Blain <levraiphilippeblain@xxxxxxxxx> · Mon, 13 Jul 2020 22:42:07 -0400

Hi Elijah,

> Le 9 juil. 2020 à 00:07, Elijah Newren <newren@xxxxxxxxx> a écrit :
> 
> On Wed, Jul 8, 2020 at 5:07 PM Philippe Blain
> <levraiphilippeblain@xxxxxxxxx> wrote:
>> 
>> Hello,
>> 
>> I've been working on a branch for a while. I've been using `git commit --fixup`  and `git commit --squash`
>> when I noticed that I had forgotten to add something to a previous commit.
>> Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the
>> first 'fixup!' commit with a conflict. However, the conflict is not located at the right place
>> in the code (it's not in the right subroutine!). This is very surprising to me, and I would
>> like to understand why it happens.
>> 
>> Steps to reproduce:
>> 
>> git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice
>> cd cice
>> git rebase -i --autosquash my-first-commit
>> # save the todo list without modifications
>>  Auto-merging <file>
>>  CONFLICT (content): Merge conflict in <file>
>>  error: could not apply e8bfa55... fixup! <commit message of f4e1ae6>
>> # the rebase stops at f4e1ae6
>> git diff
>> # tangential question : for some reason the hunk header does not appear here, I don't know why...
>> git diff -2  # but it appears here
>> git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...'  # or here
>> # ok, the conflict appears in subroutine 'picard_solver'
>> git show REBASE_HEAD -U5
>> # but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !!
>> 
>> I would have expected that the conflict be created around lines 1118-1132
>> (line numbers in f4e1ae6), in the 'anderson_solver' subroutine.
>> 
>> I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops)
>> is the commit where the 'anderson_solver' subroutine is added to the code...
>> 
>> Thanks,
>> 
>> Philippe.
> 
> If you take a look where the rebase stops, you see:

First, thanks a lot for your answer. I have a few questions.

> 
> $ git ls-files -u
> 100644 ee4377f1ec6836fa05573976a473373906c37d9f 1
> cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 100644 30c699ac371c2a751052fa98d04317e84a96ec47 2
> cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 100644 276f224e9048fe0bbd7c25822695049547362c87 3
> cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 
> The difference from the merge base to "other" (index 3) is pretty
> tiny, you just moved one line in the "anderson_solver" subroutine
> about 10 lines down.  

Yes, the output from 
$ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 :3:cicecore/cicedynB/dynamics/ice_dyn_vp.F90

seems to be the same as the one from
$ git show REBASE_HEAD

This is a little confusing to me, in the sense that I don't understand why 
the merge-base is what it is. At this point, I do

$ git merge-base HEAD REBASE_HEAD 
f4e1ae67b7d6ca36c6f3ea7c9da43d81caf24067

Ok, so 'git merge-base' finds that the merge-base between HEAD and
REBASE_HEAD is HEAD; this makes sense to me (no previous commits
have been rewritten so far, so REBASE_HEAD is directly ahead of HEAD).
But, if I try to find the commit that contains
the blob ee4377f1ec6836fa05573976a473373906c37d9f (index 1), I find
REBASE_HEAD's parent:

$ git log --all --find-object=ee4377f1ec6836fa05573976a473373906c37d9f --format='commmit %H%ntree %T%nparent %P%n%n    %s%n'
commmit e8bfa557d3c81b75116d6557784b0439b792a308
tree f6fecb8193c3b877f22bcb8f4d8d2c203e17f06f
parent 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61

    fixup! Add Anderson acceleration for Picard iteration

commmit 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
tree 11fd096851015c0c16b793d9bbb5db039776483b
parent 63d4c73c1dd973f620307833bd363a1d5069d090

    ice_dyn_vp: introduce 'CICE_USE_LAPACK' preprocessor macro

$ blob=ee4377f1ec6836fa05573976a473373906c37d9f
$ parent=7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61
$ git ls-tree -r $parent^{tree} | grep $blob
100644 blob ee4377f1ec6836fa05573976a473373906c37d9f	cicecore/cicedynB/dynamics/ice_dyn_vp.F90

I don't understand why at this point of the rebase, Git 
determines that the merge-base between HEAD and 
REBASE_HEAD is REBASE_HEAD's parent... this commit
is not even an ancestor of HEAD (or maybe I don't understand
what the "merge-base" is in this context?)...

> The more interesting difference is from the
> merge base to "ours" (index 2), seen with:
> 
> $ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> :2:cicecore/cicedynB/dynamics/ice_dyn_vp.F90
> 
> If you search for "anderson_solver" in that diff, you see that the
> reconstructed diff looks like "subroutine anderson_solver" is renamed
> to "subroutine picard_solver" with similar but slightly different
> arguments.  The diff routine successfully finds lots of lines common
> between these two subroutines so it seems quite logical as a way of
> representing the changes, especially since they occur near each other
> in the order of the file.  If you keep looking further in the diff, it
> says that a few hundred lines later there's another subroutine rename,
> this time from "subroutine calc_zeta_Pr" to "subroutine
> anderson_solver".  Looking at it this way, it's quite natural to apply
> the change from the rebased commit to the picard_solver since the
> other side of history "renamed" anderson_solver to picard_solver.
> 
> This might not make as much sense to you, since you tend to think in
> terms of parse trees and "subroutines" are a high level unit.  But
> diffs work in terms of lines and have no knowledge of any kind of file
> structure or semantics.  If you have a string of functions A,B,C and
> insert a new function Z on one side of history somewhere in a file
> that happens to look a lot like existing functions (meaning several
> identical lines between the two), then diffs won't necessarily treat
> it as a contiguous block of inserted lines but instead compare
> function Z on one side to function B on the other, then function B on
> the first side to function C on the second, etc.
> 
> There are alternative diff algorithms that try to minimize the number
> of changed lines shown by the diff, such as --histogram and
> --patience, and they do shrink this particular diff, but they both
> yield the same treatment of comparing "subroutine anderson_solver" to
> "subroutine picard_solver" essentially treating it as a rename.  To
> get a merge like you want, you'd need some kind of higher level
> semantic merge, or at least need functions to not have so many lines
> in common between them.

Thanks for these explanations. The behaviour makes sense.

> 
> This kind of problem also causes big issues when reordering functions
> in a file.  line-by-line diffs and the diff3 merge algorithm tend to
> really struggle with those.  (See e.g.
> https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf if you want
> to read up more on these.)

Thanks for the pointer! 

> 
> Hope that helps,
> Elijah

Thanks,
Philippe.