Re: dist-hg proof-of-concept ready for use

Carl Worth <cworth@xxxxxxxxxx> · Thu, 09 Nov 2006 00:38:13 -0800

On Wed, 08 Nov 2006 22:31:52 -0500, Havoc Pennington wrote:
> I guess we are getting off-topic ;-)

Ah well. But fedora is in the process of deciding what to replace
dist-cvs with, so I think it's a useful conversation to have. [I did
have a short reply here at one point, but somehow it kept getting
longer and longer---I apologize in advance. Just read the c) d) and e)
bullets I added if you want the short version.]

> Agreed - other things matter also, is all.

Sure. Part of Keith's point is that all the other things are easy to
fix, but bugs in the repository format are not, (for example, if they
lead to corruption at some point, or fail to capture some useful piece
of information).

> I'm happy to just take your word that git has a great implementation
> (btw, I'm not trying to talk anyone in or out of using git, just trying
> to understand the appeal since various people have tried talking me into
> using it).

Sure. And I'm glad to share the things I've benefitted from it.

> As a start on the logging of new user impressions, this was one of the
> things I found confusing trying to use git ("is it doing anything?")

There's actually a funny thing that happened recently on this point. A
user asked for the command to switch from one branch to another in
his cairo repository. I gave him the "git checkout <branch>" command
he needed and his reply was:

	I proceeded with "git checkout <branch>" and that command
	instantly returned to the prompt. Is that normal?

Cairo's definitely not the biggest project around, but it is
impressive that even on much larger projects, operations like changing
from one branch to another can be so fast as to make the user think
that nothing has happened at all.

> That's kind of why I find myself asking about git - from a black box
> point of view, it looked the same (plus a lot of extra implementation
> leakage), but the "hype" if you will seems to imply more ("distributed
> workflow"? don't know).

The same as what? If you're looking for a modern replacement to the
horrors of cvs, (and subversion copied many of the horrors as a design
decision), then any of bzr, git, or hg will be a vast improvement. So,
yes, at one level they are all the same.

But projects do need to pick something at the end of the day, so
that's some of the motivation for debate that happens about even
some of the most minute details[*].

> The answers to "what's the appeal?" I understand so far are:
>   a) it has offline operation

Which applies to any of these three, (bzr, git, or hg), as compared to
cvs or svn.

And "offline" is more than "I can get work done on an airplane",
(though it's very nice for that). It also means things like "no
second-class citizens"---everybody gets access to the benefits of the
tool without needing to be blessed with commit access first.

It also reduces a barrier to committing, (since it makes it
faster). Anything that encourages smaller commits improves code
quality and usefulness of code history for debugging, etc. This kind
of thing really shouldn't be overlooked.

Additionally, a really important aspect of these distributed systems
is the fact that they allow people to just throw code back and forth
at each other and still manage it. I've heard a lot of people say
"Well, that's fine and all, but my project doesn't work like that---we
have a central repository." First, of course any established project
has a central repository, and yes, with any of these systems you will
still have a central repository.

But note that most code does not start out that way. Particularly
before a project has grown big enough for it to even be worth setting
up some central site for it, it's incredibly useful to be able to
start managing that code. And it's very useful for people to be able
to toss it around, without deciding an "owner" or "maintainer" or even
knowing if there will ever be any future code exchange with the other
person, (but if there is, the tool will help with merging that all
back up from the last common point).

A good real-world example I can point to on this front is the
gtk-theme-torturer code. I first saw this code as a tar-file linked to
from the gnome performance-list. If it had only lived as a tar file,
I'm sure I wouldn't have bothered sending fixes back to Manu for
it. But he gave me the code through git as well and it was perfectly
natural for me to just locally commit any fixes I made, (because it
was just so easy), even before I ever thought if he might want to see
the changes I was making. Before I knew it I did have some useful
changes and Manu and I merged back and forth for a bit, (Manu with 26
commits and me with 9).

The next thing I knew Benjamin Otte had me pull from his git tree
which added 41 commits on top of what I had. I have no idea what the
current state of the code is, but I imagine Manu coded some more and
probably easily merged all of that stuff from Benjamin as well.

This kind of sharing back and forth is really natural with a fresh
little pile of code like this, and I'm 100% convinced the code I
contributed never would have left my laptop if not for a distributed
source control system. The burden of things like "adding a committer"
to a centralized system is just way too expensive for little,
"unofficial" code like this, and it just wouldn't have happened. I
think Manu would have just been writing that code alone.

>   b) the implementation has good robustness

Yes. That git's repository format involves only creating new files
rather than appending, truncating, or modifying existing files is
something very comforting to me. And that git's structure guarantees
that no accidental or deliberate corruption of an existing object can
avoid detection is also important.

Some of the other things I would say are in git's favor:

c) performance

   I sometimes think Linus obsesses about the git performance a little
   much sometimes, but it really does have important impacts. That
   committing and changing branches are instantaneous operations makes
   it that much easier to use the tool rather than avoid it. This
   helps the tool help the user that much more. It really does improve
   the code.

d) history (and live branch) exploration

   In my previous message I tried to express some of this, but I doubt
   the message really sinks in until you've seen it in action. And
   more than either bzr or hg, git very much encourages multiple
   branches to exist in a repository and gives easy ways to reason
   about them. For example, the "trunk..branch" syntax that is used
   for naming a sequence of commits that exist in branch but not on
   the trunk is extremely useful. (Bzr copied the ".." syntax but
   missed and uses it only for a linear sequence, it doesn't have the
   "one branch to another" behavior that is what I use all the time in
   git.)

e) git bisect

   This tool is another phenomenal invention from git. People are
   first introduced to this as something that helps pinpoint a bug in
   a liner sequence of commits through a binary search. It does
   provide that, but it also does something much more important, (and
   I don't know if the hg and bzr copies of bisect have this next
   part). Namely, you can bisect using two different branch heads. For
   example, when I recently found some weird behavior in cairo's
   stable branch, I immediately said "the development branch doesn't
   have that behavior" and started a git-bisect session using the two
   branch heads as the good and bad starting points. And git-bisect
   does exactly the right thing, (subdividing the DAG at each point as
   close to in half as possible), until it pointed me at exactly the
   buggy commit.

So there's my attempt at some aspects of the tool focused on how you
actually use it rather than details of what the implementation or
storage is.

There are a couple of potential negatives for git. One, it doesn't
have a "native" Windows client, (it works with cygwin but I'm told
that doesn't really "count" for Windows users---I still don't see how
something like python seems to "count" as native in contrast, but
maybe it's just because I'm not a Windows user). Two, it currently
requires all history to be retrieved initially. This can be a bit of a
burden, but there is current work for "shallow clone" to make this
problem go away.

Strategies for handling rename are different in the various tools. As
best as I can tell, there exist plausible scenarios in which any given
tool won't support renames as well as another.

-Carl

[*] As an example there was a hundreds-of-messages-long thread
cross-posted to git and bzr lists recently that centered largely on
whether bzr deserved to get a point over git on a comparison table for
"simple namespace", (1.2.15.3 vs. 384191d3). I can certainly
appreciate a "who cares?!" response to a thread like that.
Attachment:
pgpGbsylzpilq.pgp

Description: PGP signature
--
Fedora-maintainers mailing list
Fedora-maintainers@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-maintainers
--
Fedora-maintainers-readonly mailing list
Fedora-maintainers-readonly@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-maintainers-readonly