Re: rebase-with-history -- a technique for rebasing without trashing your repo history

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 14 Aug 2009 00:39:48 +0200

Björn Steinbrink wrote:
> On 2009.08.13 14:46:07 +0200, Michael Haggerty wrote:
>> (Please note that this technique only works for the typical "tracking
>> upstream" type of rebase; it doesn't help with rebases whose goals are
>> changing the order of commits, moving only part of a branch, rewriting
>> commits, etc.)
> 
> Hm, so that pretty much doesn't work at all for creating a clean patch
> series, which usually involves rewriting commits, squasing bug fixes
> into the original commits that introduced the bug etc.

Now that you mention it, there are some other uses of rebase whose
history could be recorded correctly, or at least better, in the DAG.  I
am not ready to advocate any of these changes, but I think they are
worth discussing.

A squash of two adjacent commits currently transforms this:

A---B1---B2---C

to this

A---B12---C'

(C' has the same contents and commit message as C but a different
history and therefore a different SHA1.)

But B12 includes both B1 and B2, so the correct history is this:

A---B1---B2----C
 \         \    \
  ---------B12---C'

The fact that B12 has B1 and B2 as ancestors tells git that it
incorporates both of their changes, which correctly describes reality.

Splitting a commit (not really an elementary rebase operation, but
achievable with "edit") transforms this:

A---B12---C

to this:

A---B1---B2---C'

It is not possible to represent this correctly in the DAG, because there
is no way to express that "B1" includes part of "B12" as a parent.  But
the following would be more accurate than discarding all history:

A---B12---C
 \     \   \
  B1---B2---C'

It would of course be difficult for the user-interface layer to be
confident that the changes in B1 and B2 are really equivalent to B12
unless the content of B12 and B2 are identical.

Inserting a new commit into the history (for example as part of
reordering later commits) transforms this:

A---C

to this:

A---B---C'

In this case C' includes everything that is in C, so the correct history is

A---C
 \   \
  B---C'

However, deleting a commit, which transforms this:

A---B---C

to this:

A---C'

cannot be represented in the DAG, because there is no way to express
that C' includes the changes from C without also implying that it
includes the changes from B.

Rewriting a single commit, under the assumption that the rewritten
commit is the logical equivalent of the original, transforms this:

A---B1---C

to this:

A---B2---C'

where B2 is a hand-rewritten version of B1, and C' is the version of C
produced by the rebase.  In this case, the history could be recorded as:

A---B1---C
 \    \   \
  ----B2---C'

but again, it is impossible for the user-interface layer to ascertain
that B2 is equivalent to B1 without help from the user.

All of this extra history would currently create far more clutter than
it is worth, but if there were a way to suppress the display of rebased
commits (as discussed in the third article I quoted), then the extra
information would be there to help git without overwhelming users.

> And even for just continously forward porting a series of commits, a
> common case might be that upstream applied some patches, but not all.
> Can you deal with that?
> 
> Example:
> 
> A---B---C (upstream)
>      \
>       H---I---J---K (yours)
> 
> Upstream takes some changes:
> 
> A---B---C---I'--K'--D (upstream)
>      \
>       H---I---J---K (yours)
> 
> rebase leads to:
> 
> A---B---C---I'--K'--D (upstream)
>                      \
>                       H'--J' (yours)
> 
> What would your approach generate in that case?

There *is no way* to represent this history in a DAG, and therefore the
history of this operation will necessarily be lost.  (Well, of course it
could be recorded in metadata supplemental to the DAG, but since the
history would not affect future merges it would be pointless.)  The
problem is that there is no way to claim that I' is derived from I
without also implying that I' includes the change in H (which it
doesn't).  I discuss this sort of thing in another article [1].

>> For more information, please see the full articles: [...]
> 
> In this one you have two DAGs:
> (I fixed the second one to also have the merge commit in "subsystem"
> instead of "topic", so they only differ WRT to the rebased stuff)
> 
> A)
> m---N---m---m---m---m---m---M  (master)
>      \                       \
>       o---o---O---o---o       o'--o'--o'--o'--o'--S  (subsystem)
>                        \                         /
>                         *---*---*-..........-*--T (topic)
> 
> 
> B)
> m---N---m---m---m---m---m---M  (master)
>      \                       \
>       \                       o'--o'--o'--o'--o'----------S  (subsystem)
>        \                     /   /   /   /   /           /
>         --------------------o---o---O---o---o---*---*---T (topic)
> 
> 
> And you say that the former creates problems when you want to merge
> again. How so?

As you very clearly showed (thanks!), the merge problems that I claimed
only occur in some obscure edge cases.

What I *should* have emphasized is that the merge S itself is much more
prone to conflicts in case A) (with merge base N) than in case B) (with
the last "o" as merge base).  That is the first advantage of
rebase-with-history.

And please note that I really advocate C), not B):

C)
m---N---m---m---m---m---m---M  master
     \                       \
      \                       o'--o'--o'--o'--o'  subsystem
       \                     /   /   /   /   / \
        --------------------o---o---O---o---o   \
                                             \   \
                                              \   *'--*'--*'  topic
                                               \ /   /   /
                                                *---*---*

where the topic branch is not merged into the subsystem branch but
rather rebased-with-history.  C) has the significant advantage over A)
or B) that the topic branch can be converted to a series of patches (the
*' patches) that apply cleanly to the rebased subsystem branch and can
therefore be submitted upstream.  In the case of A) or B), the only
available patch that applies cleanly to the rebased subsystem branch is
S, which is a single commit that squashes together the entire topic
branch and is therefore difficult to review.

So rebasing in a public repository makes it difficult for downstream
developers to apply their work to the rebased branch (because they have
to repeat the conflict resolution that was done in the upstream rebase),
and merging in a topic branch makes it more difficult to create an
easily-reviewable patch series.  rebase-with-history has neither of
these problems.

Michael

[1] "Git, Mercurial, and Bazaar—simplicity through inflexibility",
http://softwareswirl.blogspot.com/2009/08/git-mercurial-and-bazaarsimplicity.html

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html