Re: Comments on "Understanding Version Control" by Eric S. Raymond

Jakub Narebski <jnareb@xxxxxxxxx> · Tue, 10 Feb 2009 02:20:06 +0100

On Mon, 2 Feb 2009, Jakub Narebski wrote:

UVC = "Understanding Version-Control Systems" (draft),
http://www.catb.org/esr/writings/version-control/version-control.html

> UVC> = What, if anything, have we learned from history? =
> UVC> 
> UVC> There's a folk saying that "It's not what you don't know that
> UVC> hurts you, it's what you think you know that ain't so." In
> UVC> examining the pattern of development of VCSes, it seems to me
> UVC> that the this sub-field of computer science has been less
> UVC> hampered than most by difficulties in finding appropriate
> UVC> techniques, but more hampered than most by wrong assumptions that
> UVC> hung on far longer than they should have. 
> UVC> 
> UVC> First wrong assumption: Conflict resolution by merging is
> UVC> intractably difficult, so we'll have to settle for locking. It
> UVC> took at least fifteen and arguably twenty years for VCS designers
> UVC> to get shut of that one. But it's historical now. 
> UVC> 
> UVC> Second wrong assumption: Change history representation as a
> UVC> snapshot sequence is perfectly dual to the representation as
> UVC> change/add/delete/rename sequences.. This folk theorem is well
> UVC> expressed in the 2004 essay "On Arch and Subversion"[3]. It is
> UVC> appealing, widely held, and dead wrong.
> UVC> 
> UVC> File renames break the apparent symmetry. The failure of
> UVC> snapshot-based models to correctly address this has caused
> UVC> endless design failures, subtle bugs, and user misery. 
> 
> It is not true.  Example of snapshot-based Git, which with its rename
> detection deals very well in practice with file renames contradict
> this theory.  Bazaar which is supposedly snapshot-based, yet support
> "container identities" ('file-ids') contradict this further.

Now after thinking about this a bit, I reckon that the second wrong
assumption is not the fact that snapshots sequences representation
are perfectly dual to changesets representation, because in practice
(as in: merge doesn't have exponential time in history size) they are.
It is not even assumption that renames are not important, or in other
words not dealing correctly with renames and copies.

No, second wrong assumption (if we want to phrase knowledge from
history of version control in this terms) is not realizing that it
is _merging_ that has to be easy.  Both to be able to do branching
(stable, development; feature branches), and for collaboration: the
distributed part of distributed version control systems (Linus' 
"network of trust").  And intelligent, rename-aware merge strategy
is _necessary_ component for doing automated merge.  Necessary,
and very important, but only a _component_.

That is what Subversion, at least up to Subversion 1.5, got wrong.
It made branching (or facsimile / cheap imitation of branching)
easy, but it *didn't* made merging easy.  Even in SVN 1.5 it is not,
from what I understand, very easy.

Easy merging is extremely important for DVCS in OSS development, as
usually centralized VCS with need for commit rights simply do not
scale up to the sizes required by larger OSS projects, especially
those with diverse developers.

P.S. By the way, the hgbook contains quite good description of DVCS;
description of beginnings of Git can be found at GitHistory page on
Git Wiki; you can find history of adding features and changing design
and UI of Git in Junio C Hamano "Git Chronicles", presented at
GitTogether'08.
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html