Re: VCS comparison table

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 20 Oct 2006 11:51:12 +0200

James Henstridge wrote:
> On 20/10/06, Carl Worth <cworth@xxxxxxxxxx> wrote:
>> On Thu, 19 Oct 2006 19:01:58 -0400, Aaron Bentley wrote:

>>>             Additionally, the new mainline can keep a mirror of the
>>> abandoned mainline in its repository, because there are virtually no
>>> additional storage requirements to doing so.
>>
>> And this part I don't understand. I can understand the mainline
>> storing the revisions, but I don't understand how it could make them
>> accessible by the published revision numbers of the "abandoned"
>> line. And that's the problem.
> 
> With this sort of setup, I would publish my branches in a directory
> tree like this:
> 
>     /repo
>         /branch1
>         /branch2
> 
> I make "/repo" a Bazaar repository so that it stores the revision data
> for all branches contained in the directory (the tree contents,
> revision meta data, etc).

And here we have a feature which is as far as I see unique to git,
namely to have persistent branches with _separate namespace_. It means
that we can have hierarchical branch names (including names like
"remotes/<remotename>/<branch of remote>", or "jc/diff"), and we don't
have to guess where repository name ends and branch name begins.

The idea of "branches (and tags) as directories" was if I understand
it correctly introduced by Subversion, and from what can be seen from
troubles with git-svn (stemming from the fact that division between
project name and branch name is the matter of _convention_) at least
slightly brain-damaged.

> The "/repo/branch1" essentially just contains a list of mainline
> revision IDs that identify the branch.  This could probably be just
> store the head revision ID, but there are some optimisations that make
> use of the linear history here.
> 
> If the ancestry of "/repo/branch2" is a subset of branch1 (as it might
> be if the in the case of forked then merged projects), then all its
> revision data will already be in the repository when branch1 was
> imported.  The only cost of keeping the branch around (and publishing
> it) is the list of revision IDs in its mainline history.
> 
> For similar reasons, the cost of publishing 20 related Bazaar branches
> on my web server is generally not 20 times the cost of publishing a
> single branch.
> 
> I understand that you get similar benefits by a GIT repository with
> multiple head revisions.

You can get similar benefits by a GIT repository with shared object
database using alternates mechanism. And that is usually preferred
over storing unrelated branches, i.e. branches pointing to disconnected
DAG (separate trees in BK terminology) of revision, if that you mean by
multiple head revisions (because in GIT there is no notion of "mainline"
branch, only of current (HEAD) branch).

>>>> But for these communications, revision numbers will not provide
>>>> historically stable values that can be used.
>>>
>>> They certainly can.
>>>
>>> The coder says "I've put up a branch at http://example.com/bzr/feature.
>>>  In revision 5, I started work on feature A.  I finished work in
>>> revision 6.  But then I had to fix a related bug in revision 7."
>>
>> "I've put this branch up" isn't historically stable...
> 
> With the repository structure mentioned above, the cost of publishing
> multiple branches is quite low.  If I continue to work on the project,
> then there is no particular bandwidth or disk space reasons for me to
> cut off access to my old branches.
> 
> For similar reasons, it doesn't cost me much to mirror other people's
> related branches if I really care about them.

But the revision number in this case _changes_. It is from 7 to
branch:7 but still it changes somewhat.

[...]
>> The naming in git really is beautiful and beautifully simple.
> 
> I don't think anyone is saying that universally unique names are bad.
> But I also don't see a problem with using shorter names that only have
> meaning in a local scope.
> 
> I've noticed some people using abbreviated SHA1 sums with GIT.  Isn't
> that also a case of trading potential global uniqueness for
> convenience when working in a local scope?

Emphasisis on _potential_. SHA1 id abbreviated to 6 characters might
be not unique in larger project, but for example the chance that
SHA1 id abbreviated to 7 or 8 characters is not unique is really low.

Yet another analogy:

SHA1 identifiers of commits (and not only commits) can be compared
to Message-Ids of Usenet messages, while revision numbers can be compared
to Xref number of Usenet message which if I understand correctly is unique
only for given news server. But Message-Ids cannot be shortened
meaningfully like SHA1 ids can; newertheless they are used in communication
without any problems. Even if namespace is not simple ;-)

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html