Re: VCS comparison table

Carl Worth <cworth@xxxxxxxxxx> · Sat, 21 Oct 2006 13:47:08 -0700

On Sat, 21 Oct 2006 08:01:11 -0500, "Matthew D. Fuller" wrote:
> I think we're getting into scratched-record-mode on this.

I apologize if I've come across as beating a dead horse on this. I've
really tried to only respond where I still confused, or there are
explicit indications that the reader hasn't understood what I was
saying, ("I don't understand how you've come to that conclusion",
etc.). I'll be even more careful about that below, labeling paragraphs
as "I'm missing something" or "Maybe I wasn't clear".

> G: So use revids everywhere.
>
> B: Revnos are handier tools for [situation] and [situation] for
>    [reason] and [reason].

I'm missing something:

I still haven't seen strong examples for this last claim. When are
they handier? I asked a couple of messages back and two people replied
that given one revno it's trivial to compute the revno of its
parent. But that's no win over git's revision specifications,
(particularly since they provide "parent of" operators).

> > It may be that the centralization bias
>
> I think it's more accurately describable as a branch-identity bias.
> The git claim seems to be that the two statements are identical, but I
> have some trouble swallowing that.

Maybe I wasn't clear:

There's no doubt that there has been semantic confusion over the term
branch that has been confounding communication on both sides. Here's
my attempt to describe the situation, (which only became this clear
recently as I started playing with bzr more). This is not an attempt
at a complete description, but is hopefully accurate, neutral, and
sufficient for the current discussion:

  Abstract: In a distributed VCS we are using a distributed process to
  create a DAG, (nodes are associated with revisions and point to parent
  nodes). The distributed nature means that the collective DAG will have
  multiple source nodes, (often termed heads or tips).

  Git: A subset of the DAG is stored in a "repository". The DAG in the
  repository may have many source nodes. A "branch" is a named reference
  to a node (whether or not a source). Multiple local repositories may
  share storage for common objects. There are inter-repository commands
  for copying revisions and adjusting branch references, but basically
  all other operations act within a single repository.

  Bzr: A subset of the DAG is stored in a "branch". The DAG in the
  branch has a single source node. Multiple local branches may share
  storage for common objects through a "repository". Basically all
  operations (where applicable) can act between branches.

Let me know if I botched any of that.

One concept that is really not introduced in the above is the
colloquial concept of a "branch" as a "line of development". In my
experience, this notion is a fundamentally short-lived thing. For
example, work happens on a feature branch for a while, and then it
gets merged into the mainline. After the merge, there's not that much
significance to the branch anymore. In a sense, it no longer exists
but for a few edges in the graph.

I imagine that both git and bzr users both use this short-lived aspect
in practice. After merging, git users drop their branch references and
bzr users drop their directories containing their branches. Anything
else would be unwieldy as the number of merged-in, "uninteresting"
branches would grow without bound and there wouldn't be any advantage
to keeping them around.

But dropping a merged branch in bzr means throwing away the ability to
reference any of its commits by its custom, branch-specific revision
numbers. And the revision numbers _do_ change, pull, branch, and merge
all introduce revision number differences between branches, (or
changes within a branch in the case of pull). And there is no simple
way to correlate the numbers between branches.

Maybe you can argue that there isn't any centralization bias in
bzr. But anyone that claims that the revnos. are stable really is
talking from a standpoint that favors centralization.

But, here's a unifying point about git and bzr. Git also allows
branch-specific, unstable names for revisions. And they're even more
unstable than the ones bzr generates. But there are some important
differences between how they are used, (both by the tool and by
people).

To illustrate, yesterday I gave an example where performing a bzr
branch from a dotted-decimal revision would rewrite the numbers from
the originating branch (1.2.2, 1.2.1, and 1) to unrelated numbers in
the new branch (3, 2, 1). I was surprised at first, and couldn't
imagine any sane reason for the tool to go off and invent new names.
It prevents a user of the new branch from referencing any commits by
their original names. It also prevents the user from communicating
with anyone with these new names, (unless the user publishes the
branch, and any parties to the communication retain the new branch for
as long as said communication might be reference).

But then I realized why bzr is doing this. It's because, bzr users
don't just use the revision numbers for external communication, but
they also use them for lots of direct interaction with the tool. The
rewriting makes it easy to write something like "bzr diff -r1..3".

And it turns out that git also allows branch specific naming for the
exact same reason. In place of 3, 2, 1 in the same situation git would
allow the names HEAD, HEAD~1, and HEAD~2 to refer to the same three
revisions. So the easy diff command would be "git diff HEAD~2 HEAD".
(And where I have HEAD here I could also use any branch name, or any
other reference to a commit as well.)

So there are two fundamentally different uses for names, (and Linus
recently talked about this in some length): 1. day-to-day working with
the tool and 2. externally communicating about specific revisions.

Both bzr and git allow for unstable, branch-specific names to be used
as a convenience in the case of the day-to-day working. Maybe some of
the people that dislike git's "ugly" names so much is that they
imagine that to compare two revisions a user of git must inspect the
logs, fish out the sha1sum for each, and then cut-and-paste to create
the command needed. I agree that if that were required, it would be
exceedingly painful. But that's not required, what the git user uses
is branch names and simple variations.

Now, there are some important difference in the unstable names that
git and bzr has. Most importantly, git's are even less stable, (with
respect to the association between a name and any specific
revision). With every commit, all of the git names effectively shift
as the branch moves, (HEAD points to the new commit, HEAD~1 points to
what HEAD previously pointed to). This is remarkably useful since it
provides stability in terms of what the user cares about, (the latest
commit and it's closest ancestors). This means that "diff from
grandparent to current commit" is always "git diff HEAD~1 HEAD" where
as in bzr it is "git diff -r<X-2>..<X>" and the user actually does
need to lookup X first, (unless there's more to the bzr revision
specification than I've seen).

Finally, since these branch-specific names are changing all the time,
there's never any temptation for people to attempt to use them to for
external communication. In contrast, by being numbered in the opposite
direction, bzr revision numbers give a false appearance of stability
and people _do_ use them for communication. This is the mistake we've
been warning bzr users about in this thread.

Also, since the git names are so predictable, git almost never emits
them. It accepts them as names just fine, but it doesn't generate
them, (log, and commit never show the branch-specific names). I think
the only git command that even can emit such a name was a recently
added git-name-rev which exists solely for the purpose of mapping a
commit identifier to a local, branch-specific name which might have
more intuitive meaning for the user.

So the fact that things like git-log doesn't print these names also
helps avoid any trap of users trying to communicate with something
unstable.

-Carl
Attachment:
pgppnDyKQ71iB.pgp

Description: PGP signature