Hi Elijah, > Le 9 juil. 2020 à 00:07, Elijah Newren <newren@xxxxxxxxx> a écrit : > > On Wed, Jul 8, 2020 at 5:07 PM Philippe Blain > <levraiphilippeblain@xxxxxxxxx> wrote: >> >> Hello, >> >> I've been working on a branch for a while. I've been using `git commit --fixup` and `git commit --squash` >> when I noticed that I had forgotten to add something to a previous commit. >> Today I did `git rebase --autosquash` to clean up my history, and the rebase failed at the >> first 'fixup!' commit with a conflict. However, the conflict is not located at the right place >> in the code (it's not in the right subroutine!). This is very surprising to me, and I would >> like to understand why it happens. >> >> Steps to reproduce: >> >> git clone -b branch-to-be-rebased https://github.com/phil-blain/CICE.git cice >> cd cice >> git rebase -i --autosquash my-first-commit >> # save the todo list without modifications >> Auto-merging <file> >> CONFLICT (content): Merge conflict in <file> >> error: could not apply e8bfa55... fixup! <commit message of f4e1ae6> >> # the rebase stops at f4e1ae6 >> git diff >> # tangential question : for some reason the hunk header does not appear here, I don't know why... >> git diff -2 # but it appears here >> git grep -p -e '<<<<<<< HEAD' -e '>>>>>>> e8bfa55...' # or here >> # ok, the conflict appears in subroutine 'picard_solver' >> git show REBASE_HEAD -U5 >> # but the original "fixup!" commit only modifies the subroutine 'anderson_solver' !! >> >> I would have expected that the conflict be created around lines 1118-1132 >> (line numbers in f4e1ae6), in the 'anderson_solver' subroutine. >> >> I don't know if this plays a part here, but commit f4e1ae6 (where the rebase stops) >> is the commit where the 'anderson_solver' subroutine is added to the code... >> >> Thanks, >> >> Philippe. > > If you take a look where the rebase stops, you see: First, thanks a lot for your answer. I have a few questions. > > $ git ls-files -u > 100644 ee4377f1ec6836fa05573976a473373906c37d9f 1 > cicecore/cicedynB/dynamics/ice_dyn_vp.F90 > 100644 30c699ac371c2a751052fa98d04317e84a96ec47 2 > cicecore/cicedynB/dynamics/ice_dyn_vp.F90 > 100644 276f224e9048fe0bbd7c25822695049547362c87 3 > cicecore/cicedynB/dynamics/ice_dyn_vp.F90 > > The difference from the merge base to "other" (index 3) is pretty > tiny, you just moved one line in the "anderson_solver" subroutine > about 10 lines down. Yes, the output from $ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 :3:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 seems to be the same as the one from $ git show REBASE_HEAD This is a little confusing to me, in the sense that I don't understand why the merge-base is what it is. At this point, I do $ git merge-base HEAD REBASE_HEAD f4e1ae67b7d6ca36c6f3ea7c9da43d81caf24067 Ok, so 'git merge-base' finds that the merge-base between HEAD and REBASE_HEAD is HEAD; this makes sense to me (no previous commits have been rewritten so far, so REBASE_HEAD is directly ahead of HEAD). But, if I try to find the commit that contains the blob ee4377f1ec6836fa05573976a473373906c37d9f (index 1), I find REBASE_HEAD's parent: $ git log --all --find-object=ee4377f1ec6836fa05573976a473373906c37d9f --format='commmit %H%ntree %T%nparent %P%n%n %s%n' commmit e8bfa557d3c81b75116d6557784b0439b792a308 tree f6fecb8193c3b877f22bcb8f4d8d2c203e17f06f parent 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61 fixup! Add Anderson acceleration for Picard iteration commmit 7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61 tree 11fd096851015c0c16b793d9bbb5db039776483b parent 63d4c73c1dd973f620307833bd363a1d5069d090 ice_dyn_vp: introduce 'CICE_USE_LAPACK' preprocessor macro $ blob=ee4377f1ec6836fa05573976a473373906c37d9f $ parent=7a8d5a82984dfedd7fac1d7ed7c7fbd1781c1f61 $ git ls-tree -r $parent^{tree} | grep $blob 100644 blob ee4377f1ec6836fa05573976a473373906c37d9f cicecore/cicedynB/dynamics/ice_dyn_vp.F90 I don't understand why at this point of the rebase, Git determines that the merge-base between HEAD and REBASE_HEAD is REBASE_HEAD's parent... this commit is not even an ancestor of HEAD (or maybe I don't understand what the "merge-base" is in this context?)... > The more interesting difference is from the > merge base to "ours" (index 2), seen with: > > $ git diff :1:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 > :2:cicecore/cicedynB/dynamics/ice_dyn_vp.F90 > > If you search for "anderson_solver" in that diff, you see that the > reconstructed diff looks like "subroutine anderson_solver" is renamed > to "subroutine picard_solver" with similar but slightly different > arguments. The diff routine successfully finds lots of lines common > between these two subroutines so it seems quite logical as a way of > representing the changes, especially since they occur near each other > in the order of the file. If you keep looking further in the diff, it > says that a few hundred lines later there's another subroutine rename, > this time from "subroutine calc_zeta_Pr" to "subroutine > anderson_solver". Looking at it this way, it's quite natural to apply > the change from the rebased commit to the picard_solver since the > other side of history "renamed" anderson_solver to picard_solver. > > This might not make as much sense to you, since you tend to think in > terms of parse trees and "subroutines" are a high level unit. But > diffs work in terms of lines and have no knowledge of any kind of file > structure or semantics. If you have a string of functions A,B,C and > insert a new function Z on one side of history somewhere in a file > that happens to look a lot like existing functions (meaning several > identical lines between the two), then diffs won't necessarily treat > it as a contiguous block of inserted lines but instead compare > function Z on one side to function B on the other, then function B on > the first side to function C on the second, etc. > > There are alternative diff algorithms that try to minimize the number > of changed lines shown by the diff, such as --histogram and > --patience, and they do shrink this particular diff, but they both > yield the same treatment of comparing "subroutine anderson_solver" to > "subroutine picard_solver" essentially treating it as a rename. To > get a merge like you want, you'd need some kind of higher level > semantic merge, or at least need functions to not have so many lines > in common between them. Thanks for these explanations. The behaviour makes sense. > > This kind of problem also causes big issues when reordering functions > in a file. line-by-line diffs and the diff3 merge algorithm tend to > really struggle with those. (See e.g. > https://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf if you want > to read up more on these.) Thanks for the pointer! > > Hope that helps, > Elijah Thanks, Philippe.