Blanket request, so I don't have to keep repeating it: Please correct me
if I'm wrong about anything below. I'm more familiar with git than I was
a month ago, but still not an expert, so I could be totally off (re)base
here. With that note of confidence...
We have a setup like this:
(external)
|
local master
|
integration
/ | \
dev1 dev2 dev3
We pull changes from the external repository (actually a Subversion
repo) into a local master. The integration repo is a clone of that.
That's our local setup, but the particulars don't matter here -- I'm
just using it as an example.
Ideally, we'd rebase the integration area against changes pulled from
master, then each dev repository would rebase against the changes from
the integration area. That would keep our histories nice and clean as we
pull changes down from the external repository.
But of course rebase will get confused and we'll end up re-applying
changes in the dev sandboxes as soon as there are any existing change in
the integration repo when we pull changes from master, because rebase
will turn those existing changes into new revisions that don't match any
previously known ones in the dev repositories.
So at the moment, as far as I can see, the only option is to use merge
rather than rebase everywhere but the leaf nodes of our repository tree,
and just live with the cluttered history. The developers will at least
have clean *local* histories, but they'll be rebasing onto a cluttered
history from the integration repo.
However, they may not want to, even if they can: as soon as I rebase,
unless everyone is very careful, I have just prevented other developers
from pulling my local commits into their local repositories before I've
pushed my stuff up to the integration area. Sibling-to-sibling pulls --
which I hope we all agree are a very useful feature of systems like git
-- have exactly the same rebase problem as parent-to-child pulls: you'll
end up re-applying the same changes if the target repo had an earlier
version of a newly-rebased chain of commits. So even in our development
repos, I suspect we'll want to avoid rebasing unless we're certain we
won't ever need to share changes directly with each other, and just live
with the clutter.
All of which made me think, gee, it'd sure be nice if there was a way to
filter out those excess merges when we view our branch history. I think
all it would take would be to mark a merge commit as a rebase-ish update
(rather than an actual integration where the merge itself is an event
that's important to us) and you could, if the user chose, discard those
merges from views of the branch history.
And then it occurred to me: if we had that, would we actually need
rebase at all? As far as I know, rebase is all about aesthetics, not
functionality; the reason you rebase instead of merge is that you don't
want to wade through zillions of irrelevant merges when you browse your
project's history. But if those merges are simply not shown to you, it
shouldn't matter that they exist. Yes, you will have more objects in
your index, but with git's delta compression, you might not even notice
the difference.
Rebase has an interesting undesirable property aside from messing up
downstream clones when they try to pull your latest changes. Since
rebase preserves the timestamps on your local commits, you almost always
end up with a situation where the history says almost all your local
commits happened before the commit they claim to have been branched
from. That's not too big a problem in most cases, but it sure isn't very
clean. The underlying problem is that when you rebase you lose not only
the history of your intermediate updates, but also the history of your
original branch creation.
Filtering rather than rewriting history would fix all of that. We could
easily report that branch XYZ was forked from branch ABC on 01/03/2006
and is up to date with all of ABC's changes up to 01/18/2006 (i.e.,
display information about the initial branch and the most recent update
merge, but none of the ones in between.) The timestamps all make sense
because you haven't lost the history. And if someone has cloned your
repository, they can keep pulling updates without anything breaking.
It might also, be possible to implement an after-the-fact rebase to
reduce the number of excess commits: a command that rebases all the
update merges older than a certain age, on the theory that you can
usually put an upper bound on how out-of-date someone's clone of your
repo is allowed to get. That would rewrite just the ancient history, not
touching anything recent, and would mark the newly-created revisions
such that pull could skip fetching them if the target already has the
more recent revisions. (The revision at which you stop rebasing should
end up with the same revision ID as the one in the actual history, since
the contents should match.) If that's not clear I can draw a picture.
Haven't thought that bit through too much so it might not be feasible in
the end, but it seems like in theory we have all the information we need
to resolve conflicts, etc.
Comments? Am I fundamentally misunderstanding how rebase works and/or
why the documentation warns people away from using it in repos that
might be pulled from?
-Steve
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html