Re: Comments on "Understanding Version Control" by Eric S. Raymond

Theodore Tso <tytso@xxxxxxx> · Wed, 4 Feb 2009 18:54:36 -0500

On Wed, Feb 04, 2009 at 03:04:02AM +0100, Jakub Narebski wrote:
> 
> I guess that this mailing list is subscribe-only, isn't it?  So doing
> CC to uvc-reviewers wouldn't, unfortunately, cut?

According to the Wayback Archive's record of the uvc-reviewers mailman
listinfo was open for anyone to join, and the archives were public,
which is why I don't mind sharing the archives with anyone who asks.

> > I'll include some of my writings on the subject from the uvc-reviewers
> > mailing list so folks can see where some of this discussion went last
> > time...  (All of this dates from January, 2008, when Eric was last
> > aggressively updating the paper in question.)
> 
> Thank you very much for those excerpts / fragments, even though
> I'd rather have your fresh comments either on current state of 
> "Understanding Version-Control Systems", or on my post.

My comments haven't changed; as you probably noted, I agree with you,
and my arguments largely parallel yours.  I was using a Reductio ad
absurdum argument to show that the same argument that claims that Git
is a primitive, hackish, SCM because it doesn't record user intention
vis-a-vis file renames could also be extended to say that use of all
current DSCM's amount to "Programming Malpractice" because they don't
allow the recording of higher level "user intentions" such as the
renaming of variables, functions, types, and class names.

My comments date from the very end of January 2008, when Eric stopped
updating his paper, and before he could start doing an extensive
description and evaluation of bzr, Mercurial and Git, so it's not
surprising that they are still relevant today.  I suspect that when he
picks up this draft again, and starts writing these sections covering
modern distributed SCM's, the sections for Mercurial, Git, Bzr,
et. al, will cause a huge amount of controversy, because even though
he is claiming to be unbaised, there is very clear in the draft to
date that he would very much like to draw a grand sweeping picture of
progress and evolution starting from "first generation systems" (RCS,
SCCS, et. al), to "second generation systems" (CVS, SVN, et. al), to
"third generation systems" (Arch, Monotone, git, Mercurial, etc.)

There are hints in the draft that he views "container identity" has
the next "evolutionary idea" which "more primitive" systems do not
have, and "more evolved" systems do have.  This can be seen from this
excerpt from his draft:

	First wrong assumption: Conflict resolution by merging is
	intractably difficult, so we'll have to settle for locking. It
	took at least fifteen and arguably twenty years for VCS
	designers to get shut of that one. But it's historical now.

	Second wrong assumption: Change history representation as a
	snapshot sequence is perfectly dual to the representation as
	change/add/delete/rename sequences.. This folk theorem is well
	expressed in the 2004 essay On Arch and Subversion. It is
	appealing, widely held, and dead wrong.

	File renames break the apparent symmetry. The failure of
	snapshot-based models to correctly address this has caused
	endless design failures, subtle bugs, and user misery.

So you can see that Eric seems to believe quite strongly that the
failure to track file renames is as fundamental an error as what he
terms the "First Wrong Assumption".  He later admits that the idea is
controversial, and that people are still "grapling" with it, but I
think he's tipped his hand about what he believes the ultimate correct
answer is with respect to this issue.

I believe, as I think you do, that the hysteria that states that you
*must* record user intention leads inexorably to the requirement to
force users to indicate "intention" by popping up Annoying Dialog
Boxes whenever they suck in a patch that was sent via e-mail so that
the SCM can record information about whether a file rename had
happened in a particular commit.  I believe this requirement to do
record user intentions and to pop up these Annoying Dialog Boxes is a
blind alley ala the vast amount of time wasted arguing over algorithms
such as Codeville precise merges.  I also believe that forcing users
to record "user intention" makes about as much sense as forcing users
to declare they are about to edit a file by explicitly taking locks on
files ala RCS.

I suspect Eric will disagree with me, but regardless of how he
completes his paper, it will almost certainly end up taking sides one
way or another on this controversy, at which point one side or the
other of this particular disagreement will argue that Eric is really
writing an advocacy paper pushing Bzr, Mercurial, or Git (depending on
how he comes out on this issue).

Your suggestion that the proof is going to be in the code makes a lot
of sense.  The examples I would suggest that we create, and then
demonstrate (or make enhancements to git) so that it can handle these
real world examples are:

1) In branch A, the directory src/plugin/innodb-experimental is
   renamed to src/plugin/innodb, and in branch B, a commit (i)
   modifies a file src/plugin/innodb-experimental/table.c, and (ii)
   creates a file src/plugin/innodb-experimental/mod-schema.c.  This
   commit in branch B is then pulled into branch A, where the
   directory rename has taken place.  The user may not know that a
   directory rename had taken place under the covers, so they don't
   give any magic options when they run the "git cherry-pick" or "git
   merge" command.  Does the right thing happen such that the right
   file in src/plugin/innodb is modified, and the new file is created
   in src/plugin/innodb, even though in the original commit, the
   changes were made to files in src/plugin/innodb-experimental?

2) And does the right thing happen if the situation is as described
   above, but in, branch C, which is descended from branch B, a new
   directory, src/plugin/innodb-experimental is created, such that
   src/plugin/innodb and src/plugin/innodb-experimental both exist.
   Now the same commit from branch A is pulled into branch C.  Will
   the correct thing happen in that the correct files in
   src/plugin/innodb are modified and created, even though there is a
   new directory containing a completely unrelated plugin that happens
   to have the name, "innodb-experimental"?

   BTW, it has been asserted that there exists at least one major open
   source project where this sort of thing happens quite often, and
   the fact that git did not do the right thing in these conditions
   was a factor their choosing another DSCM.

> Or "Detecting [Wholesame] Directory Renames"... which can be done
> using 'rename detection' paradigm, and we have patches to prove it![4]
> but unfortunately code didn't made it (yet!) into git.  And it can,
> I think, deal with splitting files into two directories, something
> which I guess in 'container identity' (directory-id) based solution
> is simply impossible

It may be that Yann Dirson's patches will handle case (1) above.
Handling case (2) is much harder, especially without slowing
everything down massively, since it would effectively mean needing to
looking for directory renames along every single commit on the branch.
(This would obviously have to be cached in some cache file.)

It can be done, I'm sure, but it would require a lot of code to get
right.  Whether or not it's worth it is a question which is open to
debate, but I believe the bzr folks have asserted that bzr can handle
both cases (1) and (2) above, and there are some folks who apparently
care.  

Whether or not a particular open source project will really and truly
run into this problem is a different question, and one can argue that
renaming plugins, and then creating new plugins with the same name as
older plugins that have since been renamed will lead to programmer
confusion, and so that's a good enough reason to avoid doing such
crazy things.  Unfortunately, you know how some programmers
are.... telling someone they shouldn't do something is often an
invitation to do exactly what you tell them is a bad idea, and then
they complain when your filesystem or your DSCM doesn't handle that
case particularly gracefully.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html