Re: Rebase safely (Re: cherry picking and merge)

Nico Williams <nico@xxxxxxxxxxxxxxxx> · Thu, 7 Aug 2014 00:11:31 -0500

On Wed, Aug 06, 2014 at 05:38:43PM -0700, Mike Stump wrote:
> Oh, wait, maybe I have misunderstood the prohibition.  I have:
> 
>        upstream  <—— fsf
>            |
>             \
>              |
>              v
> Me  <—>   master  <—> coworker.

This looks a lot like what I meant about project repos.

> Me is a git clone of master, coworker is a git clone of master.
> Master is a bare repo on a shared server where we put all of our work.
> upstream is a bare git repo of the fsf git tree for gcc.  fsf is a box

Yes, exactly.  We did used exactly this at Sun, with a rebase-only
workflow.  You won't believe it till you see it [below].

> owned by other that hosts the gcc git repository.  I do a git pull fsf
> in upstream from time to time, and a corresponding git merge fsf in Me
> from time to time.  When I like my work I do a git push (to master
> exclusively).  To go to upstream, we submit patches by hand, git is
> not really involved.  I never pull into master from upstream (don’t
> even think that’s possible since they are both bare).

I see.  Hey, if that works for you...  You could, of course, merge or
cherry-pick, or rebase your team's commits onto another copy of the FSF
(upstream) master and then send those commits: sending commits is better
than sending diffs, IMO, mostly because you get to have some metadata
and integrity protection, and because git can ensure lineage and so on.

But you could live purely with diff/patch, no question, and anywhere
between that and making full use of a VCS' powers.

Here now is what we did at Sun, mapped onto git, written as something of
a hardcopy to be more exact.

Remember, this was what we did for _all_ of Solaris.  You can probably
still find docs from the OpenSolaris days describing how to do it with
Mercurial, so you can see I'm not lying.  Thousands of engineers,
working on many discrete projects, with a large OS broken up into a few
"consolidations" (each with its own repo).

(Here the "project gate" is the team repo, that I think you call
"master" above.)

$ # on a bi-weekly (or whatever's best) basis:
$
$ git clone $foo_project_gate foo
$ cd foo
$ git remote add upstream ...
$ git fetch upstream
$ git checkout $current_master
$ new_snapshot=master-$(date +%Y-%m-%d)
$ git checkout -b $new_snapshot
$ git rebase upstream/master
$ git push origin $new_snapshot
$
$ mutt -s "PROJECT FOO: Rebase onto new master branch master-$(date +%Y-%m-%d)" foo-engineers < /dev/null

Then the engineers on this project do this (at their leisure):

$ old_snapshot=<YYYY-mm-dd from current master branch>
$ new_snapshot=<YYYY-mm-dd from new master branch>
$ cd $my_foo_project_clone
$ git fetch origin
$ for topic_branch in ...; do
    git checkout -b ${topic_branch%"-${old_snapshot}"}-$new_snapshot
    git rebase --onto master-$new_snapshot master-$old_snapshot
  done
$
$ # Ready to pick up where I left off!
...

Eventually engineers integrate commits into the project gate:

$ # I'm ready to push to the project gate!
$
$ git checkout some_topic_branch
$
$ # Note: no -f!
$ git push origin HEAD:master-$current_snapshot
...
$ # yay

Eventually the project is ready to push its commits upstream:

$ git clone $project_gate foo
$ cd foo
$ git remote add upstream ...
$ git checkout master-$current_snapshot
$ git push upstream HEAD:master

If you're not going to be sending all local commits upstream yet then
you can do an interactive rebase, put the commits you do want to send
immediately after the upstream's HEAD commit, all the others after, and
send just those.  If you do this you should create a new snapshot and
tell your team members to git rebase --onto it.

Note that we're always rebasing _new_ branches.  Never old ones.  The
project gate does plain rebases of those new branches.  Downstreams have
to rebase --onto to "recover" (it works fine).

This is a very rebase-happy workflow.  It keeps as-yet-not-contributed
commits "on top" relative to the immediate upstream of any repo.  This
makes them easy to identify, and it keeps the author/date/subject
metadata.  Because you rebase often, you don't lag the upstream by much.
Because they are "on top" it's always fast-forward merge to push --
you're always "merged", with some lag, yes, but merged.  And the person
doing the merging is the owner of the repo (team members, project
gatekeeper).

It's a bit more work each time you rebase than a merge-heavy workflow.
But it's also easier to contribute, and it's easier on each successive
upstream's maintainers.

(The upstream also kept "snapshot" branches.  Doing this has many good
side effects, not the least of which is that git prune (and gc, which I
knew about) doesn't go deleting the past of each rebase.)

> > The only use-case I've seen where a rebase-based workflow doesn't work
> 
> Well, and now mine, which I claim is a the canonical open source use
> [...]

Nah.  Sun managed this for decades without a hitch, and for products
much larger than GCC.  See above.

(It's true that it's difficult to sell some people on this workflow,
especially when their previous experiences are with VCSes that look down
on rebase.  You don't have to buy it either.  However, it works very
well.)

> I’m trying to envision how anyone could ever use rebase.  If you
> can’t share your work, it isn’t work.

Do some experiments based on the above hardcopy.  If that doesn't
convince you that it works, oh well, I'll have given it a good try.

Nico
-- 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html