Re: Workflow for templates?

Enrico Weigelt <enrico.weigelt@xxxxxxx> · Sat, 10 Nov 2012 08:13:49 +0100 (CET)

> I am somewhat unsure whether it would work this way. After all, there
> seems to
> be an unbreakable rule with git: never rebase published branches.

I dont see a big problem if you just tell the downstreams to rebase
instead of merge downwards.

That's eg. my default approach for handling things like local
customizations. The fine thing here is that you'll always have a
clear separation between upstream development and your customizations.

Let's say, you have once forked at release tag v1.2.3, added 3
customization commits and later rebase onto v1.2.4, you'll still
have your 3 customization commits ontop of the upstream release.
With merge, you'll get more and more merge commits mixed later
coming customizations, and a migh higher chance of repeating conflicts.

I'd suggest some general rules:

* strict branch hierachy
* downstreams always rebase instead of merge
* probably use --onto rebase
* development is always happening in topic-branches, that will be
  rebased before merge into their upstream --> fast-forward only

> Maybe I should try to explain the problem in terms of repository
> hierarchy. Let's assume, there is this hierarchy of repositories:

Let's talk about branches instead - repos are just containers for
branches (and tags, etc). If all people are practically in the same
administrative domain (or sort of), you can even use one single
repo for that (not counting developer's and target system's local
clones).

> upstream: central repository, containing the generic template
> 
> foo-site: repository for site foo. Here we have localizations for a
> specific
>           administrative entity named foo (say, google).
>           This is where clones for production are made from, and
>           production
>           boxes pull from here to be kept up-to-date.

Only the non-customized boxes will pull from here - if there's any bit
that needs to be changed, add separate branches for them.

And "pull" always means rebase.

When a new upstream release comes out (and is properly validated), it
will be rebased ontop of that.

> foo-devA: A clone of foo-site to make development, releases, and
> whatever for foo.
> foo-devB: One more clone of foo-site, Developer B is working here.

Developers should use topic branches, which are regularily rebased
ontop of their upstream, especially before commit and final validation.

> Further, foo-devA might be the same person as bar-devA.

He'll use separate branches anyways. Everything else is just a matter
of proper naming scheme.

For example, if you're using a central (bare) repository (again: not
counting the developer's locl clones), you could use something like
an <site>+"/" branch name prefix.

By the way: you really should use non-conflicting tag names (eg.
adding some <site>+"/" or <site>+"-" prefix), otherwise you'll
easiy run into conflicts, because per default retrieved and local
tags will all be in some namespace - you'll probably dont like to
set up separate namespaces for individual remotes (which is quite
easy to forget ;-o). Better consider tag names to be really global.

> So when foo-devA pulls from foo-devB, then foo-devB will create
> problems when he rebases after that pull.

pull (or probably: remote update) is different from merge or rebase
essentially, pull is a combination of remote update and an automatic
merge from or rebase onto (depending on the configuration) the
coresponding upstream branch.

> What I am trying to achieve, is to extend the workflow from
> development to
> deployment across multiple administrative entities. As a picture:
> 
>   upstream     (templates only).
>      ^
>      |
>      v
>   development  (configured, might contain experimental changes)
>      ^
>      |
>      v
>   deployment   (configured)
> 
> This workflow should not stop at administrative borders. Just replace
> foo by
> google and bar by Microsoft to get an idea of what I am trying to
> achieve.

We're talking about two entirely different things here:

a) repositories: container that hold references to histories
   (branches, tags, etc)

b) branches and their semantic releations

Repositories:

As git is fully distributed, it doesnt really matter where repositories
are. Developers (and other parties accessing the code) will most likely
have their own local clone. But "clone of X" means nothing more than just
happens to have some remote attachment to repo X.

So, the semantics of

    git clone /path/to/my/funny-project

is the same like:

    ( git init funny-project && \
        cd cd funny-project && \
        git remote add origin /path/to/my/funny-project && \
        git remote update origin && \
        git checkout origin/master -b master )

So, let's look at the individual steps:

   #1: git init funny-project
   --> ( mkdir funny-project && cd funny-dir && git init )
   --> creates an empty repository

   #2: git remote add origin /path/to/my/funny-project
   --> configures an remote called "origin" with url "/path/to/my/funnly-project"
       and confgures it to sync the remote-side's references from refs/heads/*
       to locally refs/remotes/origin/*, and remote-side's refs/tags/* to
       locally refs/tags (without overwriting existing tag references)

   #3: git remote update origin
   --> do the actual syncing from remote "origin", get the remote ref list,
       download all yet objects (that are required for the refs to be synced)
       and adds/updates the refs into the according target namespaces
       (BTW: if a branch was removed on remote side, the local copy in
       refs/remotes/<remote-name>/* wont be deleted - you'll need to call
       git remote prune <remote-name> for that)

   #4: git checkout origin/master -b master
   --> copies the current refs/remotes/origin/master ref to refs/heads/master
       and checks out that new local branch (IOW: sets the refs/HEAD symbolic
       ref to refs/heads/master and copies index and working tree from the
       head commit)

Branches are something completely different:

Logically, a branch is a history of commits with parent-child-relationship
(mathematically spoken, it's an directed acyclic graph): each commit may
have a variable number of parent commits.

Technically, what we usally call "branch" is in fact an name (reference
in refs/heads/* namespace) which point at the head commit of that local
branch. When you do git commit, it creates a new commit object from the
index, adds some metadata (eg. your commit message) and sets the current 
branch reference (usually that one where the symbolic reference refs/HEAD
points to) to the new commit object's SHA-key. IOW: you add a new object
in front of the DAG and move the pointer one step forward in the line.

When you do a merge (no matter if the source is remote or local - it just
needs to be an locally available object), there're essentially two things
that can happen:

a) your source is an direct descendant of the target branch (IOW: the
   target's current head commit appears somewhere in the source's history),
   it will just move the current branch forward to the merge source
   (moves the head pointer and updates index and worktree)
   this is called "fast-forward" (in fact, it the fastest kind of merge)

b) your source is not direct descendant: source tree will be actually
   merged into index/worktree, possibly make break when there're conflicts
   to be resolved manually, and create a new commit containing the current
   (now merged) index and two parent poiters, to source and to previous
   merge target.

Now what is rebase ?

A rebase rewrites history in various ways (in fact, you can do a lot more
things than just simple rebasing, eg. edit or drop older commits, etc).

For example 'git rebase origin/master' will look for the latest common
ancestor of both the current and the target treeish (eg. refs/remotes/master),
start from that tree'ish and apply the changes that happend from the last
common ancestor until your current branch head ontop of that treeish,
(possibly asking the user to manually resolve some conflicts), and then
replaces the current branch head by the final head.

As it changes history, it should be used wisely.

A common problem with using rebase and public branches is:

* upstream changes history (eg. because he rebased onto his upstream)
* downstream (per default) merges this upstream into his branch
--> git will see two entirely different branches get merged, so
    there's some good change of nasty conflicts, and history will
    easily get really ugly

So, if you do rebase your public branch, downstreams should also do so
(rebase their local branches ontop of your public branch instead of
merging yours into theirs).

By the way: there are several more kinds of rebases, which are very
interesting for complex or sophisticated workflows, eg:

* --ontop rebase: instead of letting git find out the starting point
  of commit sequence to apply on target treeish, you'll define it
  explicitly (eg. if you want it to forget about things previous to
  the starting treeish).
* interactive rebase: 
  a) is able to reconstruct merges
  b) allows to cut into the sequence and change, drop or add new commits

These operations are very useful for cleaning up the history, especially
with things like topic-branch workflow (eg. if you originally have some
hackish and unclean commits and you wanna put an clean and self-consistant
one into your mainline instead).

cu
-- 
Mit freundlichen Grüßen / Kind regards 

Enrico Weigelt 
VNC - Virtual Network Consult GmbH 
Head Of Development 

Pariser Platz 4a, D-10117 Berlin
Tel.: +49 (30) 3464615-20
Fax: +49 (30) 3464615-59

enrico.weigelt@xxxxxxx; www.vnc.de 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html