Re: Comments on "Understanding Version Control" by Eric S. Raymond

Theodore Tso <tytso@xxxxxxx> · Thu, 5 Feb 2009 08:28:14 -0500

On Wed, Feb 04, 2009 at 10:24:57PM -0800, Junio C Hamano wrote:
> All the while, the development community started discussing how the source
> tree should be organized to support multiple backends, and you learned
> that the plan is to have one directory per larger backend, while keeping
> single file ones in <db/*.c>.  Specifically, you learned that innodb
> related code will be stored in <innodb/*.c>, and there may be other
> <somedb/*.c> and <someotherdb/*.c> groups added, but you are not
> interested in anything but enhancing innodb support.
> 
> You rename "scm mv db innodb" and then add <innodb/enhanced.c>, or perhaps
> you may have done it the other way, i.e. added <db/enhanced.c> and then
> renamed "scm mv db innodb".

The argument would be that for SCM that properly tracked user
intentions, you did the wrong thing.  If the SCM properly understood
directory renames, there is a big differene between this:

	scm mvdir db innodb

and this

	scm mv db/* innodb

You see?  The first moves the *directory* db to innodb.  The second
moves all of the *files* that are in db to a new directory, innodb.
If, in your example, you had learned that the goal was to keep single
file ones in <db/*.c>, and larger backends in <innodb/*.c>, the
correct thing to tell the SCM is *not* to rename the directory db to
innodb, but rather, to move all of the files currently in <db/*.c>,
which implement innodb, into the innodb directory.  If an SCM properly
handles directory renames, it would distinguish between these two
cases and record them different, since it implies a different
intention about what should happen to new files created in <db/*.c> in
other branches when it comes time to merge them.

Of course, this distinction does not exist in git, because we track
content only.  And a number of other SCM's like Hg, which only track
file renames, wouldn't get this right either.  In order to get this
right, you need to treat directory renames as separate and distinct
operations from file renames, because they have different merge
implications.

> See how that argument is flawed?  The point of my example is that the line
> between your example (1) and (2) in the previous message is blurry.

It's blurry if you don't properly make the distinction between file
and directory renames, yes.  A SCM that only handles file renames
can't record the difference between "move all the files in directory
foo to bar" from "rename directory foo to bar".  Just as an SCM (like
git) that only handles content that tell the difference between "move
all of the lines of content from foo.c to bar.c" and "rename foo.c to
bar.c".

Our argument for git is that with sufficiently smart merge algorithms
it doesn't matter, since we can intuit the right thing at merge time.
However, your argument that it's not possible to determine whether the
new file should appear as db/gdbm.c or innodb/gdm.c is an argument
content-tracking alone isn't enough. 

Personally, I think the scenario I used of renaming plugins is more
likely that the sort of source reorganization which you've posited,
but I agree they are both possible scenarios.  The question for git
development is whether these sorts of issues ar ones that we should
try to handle or not?  After all, one possibility is just to tell
people that if they are folks who like to go wild with source tree
reorganizations all the time, they should go to some other SCM like
bzr or Hg; that in git's view, the costs of being able to handle
random file and directory renames isn't worth the benefits for what is
normally a rare occurrence (and if it's happening all the time, the
project is probably doing something else wrong....)

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html