Re: RFC: rebase without pain

Luke Lu <git@xxxxxxxxxx> · Sun, 15 Jun 2008 12:36:34 -0700

On Jun 14, 2008, at 4:17 AM, Dmitry Potapov wrote:
On Fri, Jun 13, 2008 at 06:38:44PM -0700, Luke Lu wrote:
This may have been discussed before, but I could not find it. If  
so, I
apologize for the noise and hope somebody is working on the issue.

I think we have had a somewhat similar discussion not so long ago.
It was called "inexplicable failure to merge recursively across
cherry-picks". I think you can find it here:
http://kerneltrap.org/mailarchive/git/2007/10/9/333729

Please, read carefully this Linus' posts:
http://kerneltrap.org/mailarchive/git/2007/10/10/334129
http://kerneltrap.org/mailarchive/git/2007/10/11/335451

Thanks for the pointers, Dmitry. They are indeed illuminating. I  
actually agree with Linus on all accounts here.

Based on my observation, rebase is the single most interesting and
misunderstood feature in git compared with other VCS. Once I
discovered rebase -i, I can't stop using it, because I'd like to keep
my history clean for readability and maintenance purpose.

The downside of rebase is that you are *re-writing* branch history.

I think the word "history" is too overloaded in VCS world. Sometimes  
it really means the actually chronological steps developer performed  
to solve a problem. Sometimes it just means a portion of commit/patch  
DAG, that is, a series of transformations (or functions if you prefer)  
that given an input (e.g., a parent of a tree) will produce a  
deterministic outcome. The two semantics are actually orthogonal,  
despite the fact they're often identical in reality. I think it's a  
misuse of the word "history" here, when you actually mean a set of  
patches. You can never rewrite history by definition of the original  
meaning of history because you can't turn back time. The real history  
of git is kept in the reflog.

Rebasing is rewriting a set of patches. It's a form of  
(meta)programing to compose and decompose features. The problem of  
rebasing (in addition to merge) is that we don't have proper tools to  
track rebase itself. Even though the history is in the reflog, the  
information is only kept temporarily and not propagated and used  
through fetch/merge.

It
is okay when you do that in your private branch, but when you publish
something there is no way back. It is like when you prepare on some
article, you can make a lot of drafts but when you publish it then
it is published. Any attempt to falsify history will cause a lot of
confusion.

I think the confusion mainly comes from the conflation of meanings of  
the word "history" and merge conflicts that result from lack of tool  
support.

Also, please, notice that even if a branch was rebased
without a single conflict, it does not mean that it will work.

The same applies to usual merges as well. The advantage of merge is  
that it maintains the original ancestry of the commits, so it's  
relatively easy to visualize and debug merge problems.

So, you can break things just by rebasing

So can you by just straight merging.

and it will be impossible to
find later who caused the breakage.

Yes, that's the real problem. But it's mainly caused by lack of tools  
to track rebase. I think we should probably put a 'rebase' node in the  
commit just like merge. The rebase commit will contain enough  
information to track the rebase. git log can display who did the  
rebase and gitk can even visualize the graph transformation.

Sometimes, even if the final state after rebase is working, the  
intermediate commits may not work
or even not compile.

Yes, that's another problem, but with rebase -i, we can hopefully fix  
them :)

So, I don't think that rebasing published history is a good idea.

Yes, rewriting published patch set (again you can't rewrite history,  
public or not) without proper tool to track them is definitely not a  
good idea. But don't you think we need to develop tools to track  
rebase properly?

One common use case would be maintaining a patch set against a release  
point. It's already a common practice with or without VCS support:  
People release some software version n, then release a giant patch to  
cover several bugs; later on, they realize that they need to split the  
patch for each bug and vice versa. That's rebase right there. But  
people are not really confused because they know they need to reapply  
the patch set against an official release.

You can't do that easily with git by simply pulling from upstream, yet.

__Luke
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html