Re: git-rebase produces incorrect output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 29, 2019 at 12:24 AM Pavel Roskin <plroskin@xxxxxxxxx> wrote:
>
> Hi!
>
> I've discovered an issue with "git rebase" producing a subtly
> incorrect file. In fact, that files even compiled but failed in unit
> tests! That's so scary that I'm going to stop using "git rebase" for
> now. Fortunately, "git rebase --merge" is working correctly, so I'll
> use it. Too bad there is no option to use "--merge" by default.

Indeed.  We really should fix that, if not just make it the default
for everyone.

> The issue was observed in git 2.23 and reproduced in today's next
> branch (2.24.0.449.g4c06f74957) on up-to-date Fedora 31 x86_64.
>
> I've created a repository that demonstrates the issue:
> https://github.com/proski/git-rebase-demo
>
> The branch names should be self-explanatory. "master" is the base,
> "branch1" and "branch2" contain one change each. If "branch1" is
> rebased on top of "branch2", the result is incorrect, saved in the
> "rebase-bad" branch. If "git rebase -m" is used, the result is
> correct, saved in the "merge-good" branch.
>
> The files in "rebase-bad" and "merge-good" have exactly the same lines
> but in a different order. Yet the changes on branch1 and branch2
> affect non-overlapping parts of the file. There should be no doubt how
> the merged code should look like.
>
> I believe the change on branch2 shifts the lines, so that the first
> change from branch1 is applies to a place below the intended location,
> and then git goes back to an earlier line to apply the next hunk. I
> can imagine that it would do the right thing in case of swapped blocks
> of code. Yet I have a real life example where it does a very wrong
> thing.
>
> Indeed, "git diff origin/branch2 origin/rebase-bad" and "git diff
> origin/branch2 origin/merge-good" both produce diffs of 9957 bytes
> long, different only in the order of the hunks.
>
> Another interesting data point - "git rebase --interactive" is working
> correctly.

Thanks for the detailed report and simple testcase.  Turns out the
--interactive isn't so interesting, because a few cycles back we
re-implemented the --merge behavior on top of the interactive
machinery so the two use the exact same engine.  Anyway, I can
duplicate the problem and noticed a few interesting things.  Since the
am-backend for rebase (the default) basically just uses diff and
apply, I tried duplicating with just those after looking at things and
noticing that it appeared to be applying patch hunks on the wrong
lines:

$ git switch branch2

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U3 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
 1 file changed, 43 insertions(+), 43 deletions(-)
$ git diff --shortstat origin/rebase-bad

So, this reproduces your bad results.  Let's repeat with -U4:

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U4 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
 1 file changed, 10 insertions(+), 10 deletions(-)
$ git diff --shortstat origin/rebase-bad
 1 file changed, 37 insertions(+), 37 deletions(-)

That gives us a result that matches neither merge-good nor rebase-bad,
but is closer to the good side.  Let's try again with -U5:

$ git reset --hard origin/branch2
HEAD is now at 1331204 Change on branch 2
$ git diff -U5 origin/master origin/branch1 >diff.patch
$ git apply diff.patch
$ git diff --shortstat origin/merge-good
$ git diff --shortstat origin/rebase-bad
 1 file changed, 43 insertions(+), 43 deletions(-)

Ahah!  With five lines of context, git diff & git apply can produce
the correct result.

Sadly, I tried to force this with git rebase, but -C5 only affected
the apply side and there's no option to pass to rebase to pass through
-U5 to the diff logic.  Also, although there is a diff.context config
option, git-am ignores it (Note that git_am_config() does not directly
check that value and it calls git_default_config(), not
git_diff_ui_config() or even git_diff_basic_config()).  So, it's not
possible to force the am-based rebase to get the right answer
currently even if you figure out what the problem is.

The merge-based rebase, by contrast, essentially benfits from having
the entire files of each version accessible so it automatically gets
it right.


So, to summarize here:
  * you have a case where the default 3 lines of context mess stuff
up; but rebase --merge works great
  * am doesn't have a -U option, and ignores the diff.context setting,
making it impossible to force the am backend to work on your case
and also:
  * rebase doesn't have an option to use the merge/interactive backend
by default (nor an --am option to override it)
Also,
  * The performance of the merge/interactive backend is slightly
better than the am-backend
(https://public-inbox.org/git/CABPp-BF=ev03WgODk6TMQmuNoatg2kiEe5DR__gJ0OTVqHSnfQ@xxxxxxxxxxxxxx/)
and will be getting better
  * The merge/interactive backend supports many more options than the
am-backend, though the am one still has a few the merge backend
doesn't.  Once the ra/rebase-i-more-options topic merges, --whitespace
will be the only consequential option that the am-backend supports
that the merge/interactive-backend doesn't.  (There's also -C, but as
noted above, the merge/interactive backend already have access to the
full file).

Maybe we should just switch the default, for everyone?  (And provide
an --am option to override it and a config setting to get the old
default?)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux