Re: VCS comparison table

Carl Worth <cworth@xxxxxxxxxx> · Fri, 20 Oct 2006 14:48:52 -0700

On Thu, 19 Oct 2006 21:06:40 -0400, Aaron Bentley wrote:
> I understand your argument now.

Well, I'm glad to know we each feel like we are communicating at
times, here.

>                                  It's nothing to do with numbers per se,
> and all about per-branch namespaces.  Correct?

The entire discussion is about how to name things in a distributed
system. The premise that Linus has put forth in a very compelling way,
is that attempting to use sequential numbers for names in a
distributed system will break down. The breakdown could be that the
names are not stable, or that the system is used in a centralized way
to avoid the instability of the names.

Now, that causality might not accurately describe the way bzr has
developed. It may be that the centralization bias was determined by
other reasons, and that given those, using sequential numbers for
names makes perfect sense.

But it really is fundamental and unavoidable that sequential numbers
don't work as names in a distributed version control system.

> I meant that the active branch and a mirror of the abandoned branch
> could be stored in the same repository, for ease of access.

Granted, everything can be stored in one repository. But that still
doesn't change what I was trying to say with my example. One of the
repositories would "win" (the names it published during the fork would
still be valid). And the other repository would "lose" (the names it
published would be not valid anymore). Right?

Now, maybe there's some "simple" mapping from old names to new names
for the losing repository, (something like adding a prefix of
"losers/" to the beginning of the names or something or adding a "15."
prefix or whatever). The point is that the old names are
invalidated. And there's no way to guarantee this kind of change won't
happen in the future, (no matter how old a project is).

I constructed that example to show that the naming has a social impact
in forcing a distinction between winners and losers in the merge, (or
mainline and side branch, or whatever you want to name the
distinction). The two re-joining projects could be really amiable,
create a new virgin mainline and treat both histories as side
branches. In this version, everyone loses as all the old names are
invalidated.

> Bazaar encourages you to stick lots and lots of branches in your
> repository.  They don't even have to be related.  For example, my repo
> contains branches of bzr, bzrtools, Meld, and BazaarInspect.

Git allows this just fine. And lots of branches belonging to a single
project is definitely the common usage. It is not common (nor
encouraged) for unrelated projects to share a repository, since a git
clone will fetch every branch in the repository. common for a single
base URL to provide a common basis for a hierarchy of git
repositories, (see, for example http://repo.or.cz/), and that may
provide similar benefits.

I'm noticing another terminology conflict here. The notion of "branch"
in bzr is obviously very different than in git. For example the bzr
man page has a sentence beginning with "if there is already a branch
at the location but it has no working tree". I'm still not sure
exactly what a bzr branch is, but it's clearly something different
from a git branch, (which is absolutely nothing more than a name
referencing a particular commit object). [Note: after playing with it
a bit more down below, a bzr "branch" appears to be something like a
git "repository" that can only hold a single branch.]

> I can see where you're coming from, but to me, the trade-off seems
> worthwhile.  Because historical data gets less and less valuable the
> older it gets.  By the time the URL for a branch goes dark, there's
> unlikely to be any reason to refer to one of its revisions at all.

I strongly disagree on this point. One, I don't think that the "time
for a branch to go dark" is necessarily long, (or if it is, then
that's another barrier that's setup against distributed
development---people have to have a long-term repository before they
can usefully start publishing a branch). Second, I'm not comfortable
with any limit on usefulness of history. Would you willingly throw
away commits, mailing list posts, or closed bug reports older than any
given age for any projects that you care about?

> When you create a new branch from scratch, the number starts at zero.
> If you copy a branch, you copy its number, too.
>
> Every time you commit, the number is incremented.  If you pull, your
> numbers are adjusted to be identical to those of the branch you pulled from.
>
> Is that really complicated?

OK. So now I had to actually try things out. I went ahead and
installed bzr and was able to init and commit from the man page. I had
to go to IRC to figure out how to create and change branches, (the
documentation for "bzr branch" just said FROM_LOCATION and TO_LOCATION
and I couldn't figure out what to pass for those).

Here's the setup I came up with for a tweaked version of the a[bc]m
diamond example I showed with git earlier, (I just added a second
commit to each branch before merging):

	mkdir bzrtest; cd bzrtest
	mkdir master; cd master; bzr init
	touch a; bzr add a; bzr commit -m "Initial commit of a"
	cd ..
	bzr branch master b; cd b
	touch b; bzr add b; bzr commit -m "Commit b on b branch"
	echo "change" > b; bzr commit -m "Change b on b branch"
	cd ..
	bzr branch master c; cd c
	touch c; bzr add c; bzr commit -m "Commit c on c branch"
	echo "change" > c; bzr commit -m "Change c on c branch"
	cd ../master
	bzr merge ../b; bzr commit -m "Merge in b"
	bzr merge ../c; bzr commit -m "Merge in c"

First, I've been told that this is a lot less efficient than possible
since I have what in bzr terms is three unshared "branches" here,
(what git would really call three separate "repositories").

Second, I think that using the filesystem for separating branches is a
really bad idea. One, it intrudes on my branch namespace, (note that
in many commands above I have to use things like "../b" where I'd like
to just name my branch "b". Two, it prevents bzr from having any
notion of "all branches" in places where git takes advantage of it,
(such as git-clone and "gitk --all"). Three, it certainly encourages
the storage problem I ran into above, (and I'd be interested to see a
"corrected" version of the commands above to fix the storage
inefficiencies).

But anyway, those are all new topics, what we were trying to talk
about is revision numbers. After the above commands I can run bzr log
in my three branches, master, b, and c and I get the following
revision number sequences:

master: 1 2 3
b: 1 2 3
c: 1 2 3

And from this state if I ask questions with bzr missing and look at
just the revision numbers, then the answers are useless. I get answers
like:

	.../b:$ bzr missing ../c
	You have 2 extra revision(s):
	revno: 3
	  Change b on b branch
	revno: 2
	  Commit b on b branch

	You are missing 2 revision(s):
	revno: 3
	  Change c on c branch
	revno: 2
	  Commit c on c branch

	.../b:$ bzr missing ../master
	You are missing 2 revision(s):
	revno: 3
	  Merge in c
	revno: 2
	  Merge in b

So there we have the revision numbers 2 and 3 each being used to name
three different revisions. That's a lot of aliasing already.
Then, if the b and c branches each treat master as their mainline and
each pull, then both branches get their numbers all shuffled.

Oh, drat. I just realized that I'm running 0.11 here which doesn't
have the dotted-decimal numbers. (I'm trying to get bzr.dev too, but
it appears to be stuck about 40% of the way through "Fetch phase
1/4" [Note: it ). In this version, the commits brought in as part of a merge
don't get any "simple" number at all and instead "bzr log" shows a
merge ID.

I hadn't realized that the dotted decimal notation was so new that the
community hadn't had a lot of experience with it yet. But, your
description doesn't actually presume that notation. What you asked
was:

	> When you create a new branch from scratch, the number starts at zero.
	> If you copy a branch, you copy its number, too.
	>
	> Every time you commit, the number is incremented.  If you pull, your
	> numbers are adjusted to be identical to those of the branch you pulled from.
	>
	> Is that really complicated?

And to answer. That description doesn't describe at all what happens
to the "simple" numbers of commits that are merged. In the version I
have, they disappear and get replaced with "ugly" numbers. In 0.12
something else happens instead, (that's the part I don't understand
yet).

And my argument isn't just "confusing" it's "confusing or
useless". I understand that pull destroys numbers, and how, but that
makes the numbers I had generated earlier useless. I still don't
understand how people can avoid number changing, (since pull seems the
only way to synch up without infinite new merge commits being added
back and forth).

So, yes, it really is complicated or my brain is just too small.

> > The naming in git really is beautiful and beautifully simple.
>
> Well, you've got to admit that those names are at least superficially ugly.

Sure. But I'll gladly take a simple system with superficial warts than
a complex system with superficial beauty.

> What's nice is being able see the revno 753 and knowing that "diff -r
> 752..753" will show the changes it introduced.  Checking the revo on a
> branch mirror and knowing how out-of-date it is.

With git I get to see a revision number of b62710d4 and know that
"diff b62710d4^ b62710d4" will show its changes, though much more
likely just "show b62710d4". I really cannot fathom a place where
arithmetic on revision numbers does something useful that git revision
specifications don't do just as easily. Anybody have an example for
me?

-Carl

PS. The "bzr branch" of bzr.dev did eventually finish. I can see the
dotted-decimal numbers in my example now, (1.1.1 and 1.2.2 for the
commits that came from branch b; 1.2.1 and 1.2.2 for the commits that
came from branch c). At 5 characters a piece these are well on their
way to getting just as "ugly" as git names, (once it's all
cut-and-paste the difference in ugliness is negligible).

And now, I see it's not just pull that does number rewriting. If I use
the following command (after the chunk of commands above):

	cd ..; bzr branch -r 1.2.2 master 1.2.2

It appears to just create newly linearized revision numbers from whole
cloth for the new branch (1, 2, and 3 corresponding to mainline 1,
1.2.1, and 1.2.2). That's totally surprising, very confusing, and
would invalidate any use I wanted to make of published revision
numbers for the mainline branch while I was working on this branch.

See? This stuff really doesn't work.

Motivating scenario for the above: Imagine 1.2.3 commited garbage so I
want to fix it by branching from 1.2.2 rather than the mainline
"2". Then after I branch, I learn something about "1.2.1" that I want
to investigate more closely. I try to inspect that in my branch, but
ouch! I don't have that revision.

Is there even a way to say "show me the change introduced by what is
named '1.2.1' in the source branch in this scenario" ?

Note: In #bzr I just learned that there is a way for me to do this
_if_ I also happen to have a pull of the original branch somewhere on
my machine. Something like:

	bzr diff -r1.2.0:../master -r1.2.1:../master

I don't know if there's a way to get diff's .. notation to work with
that, (I can't manage to). But these simple numbers are getting less
simple all the time.

With git, if I find a revision number somewhere, I can cut-and-paste
it and get the right thing:

	git show b62710d4f8602203d848daf2d444865b611fff09

But with bzr if I find "1.2.1" somewhere I'm likely to type:

	bzr diff -r1.2.0..1.2.1

If I'm lucky, then that fails with:

	bzr: ERROR: Requested revision: '1.2.0' does not exist in branch:

and I go back to the source, find out what branch it was referring to,
remember where that is on my machine (../master, say), and manually
type that to my command line to get:

	bzr diff -r1.2.0:../master -r1.2.1:../master

If I'm unlucky then the first diff comes up with some unrelated commit
and I get to be confused before I go through that same process.

Now do you see? It really, really does not work. This stuff is about
as un-simple as could be, and this things will happen.
Attachment:
pgp13ZS3fXl0v.pgp

Description: PGP signature