On Sat, 21 Oct 2006 08:01:11 -0500, "Matthew D. Fuller" wrote: > I think we're getting into scratched-record-mode on this. I apologize if I've come across as beating a dead horse on this. I've really tried to only respond where I still confused, or there are explicit indications that the reader hasn't understood what I was saying, ("I don't understand how you've come to that conclusion", etc.). I'll be even more careful about that below, labeling paragraphs as "I'm missing something" or "Maybe I wasn't clear". > G: So use revids everywhere. > > B: Revnos are handier tools for [situation] and [situation] for > [reason] and [reason]. I'm missing something: I still haven't seen strong examples for this last claim. When are they handier? I asked a couple of messages back and two people replied that given one revno it's trivial to compute the revno of its parent. But that's no win over git's revision specifications, (particularly since they provide "parent of" operators). > > It may be that the centralization bias > > I think it's more accurately describable as a branch-identity bias. > The git claim seems to be that the two statements are identical, but I > have some trouble swallowing that. Maybe I wasn't clear: There's no doubt that there has been semantic confusion over the term branch that has been confounding communication on both sides. Here's my attempt to describe the situation, (which only became this clear recently as I started playing with bzr more). This is not an attempt at a complete description, but is hopefully accurate, neutral, and sufficient for the current discussion: Abstract: In a distributed VCS we are using a distributed process to create a DAG, (nodes are associated with revisions and point to parent nodes). The distributed nature means that the collective DAG will have multiple source nodes, (often termed heads or tips). Git: A subset of the DAG is stored in a "repository". The DAG in the repository may have many source nodes. A "branch" is a named reference to a node (whether or not a source). Multiple local repositories may share storage for common objects. There are inter-repository commands for copying revisions and adjusting branch references, but basically all other operations act within a single repository. Bzr: A subset of the DAG is stored in a "branch". The DAG in the branch has a single source node. Multiple local branches may share storage for common objects through a "repository". Basically all operations (where applicable) can act between branches. Let me know if I botched any of that. One concept that is really not introduced in the above is the colloquial concept of a "branch" as a "line of development". In my experience, this notion is a fundamentally short-lived thing. For example, work happens on a feature branch for a while, and then it gets merged into the mainline. After the merge, there's not that much significance to the branch anymore. In a sense, it no longer exists but for a few edges in the graph. I imagine that both git and bzr users both use this short-lived aspect in practice. After merging, git users drop their branch references and bzr users drop their directories containing their branches. Anything else would be unwieldy as the number of merged-in, "uninteresting" branches would grow without bound and there wouldn't be any advantage to keeping them around. But dropping a merged branch in bzr means throwing away the ability to reference any of its commits by its custom, branch-specific revision numbers. And the revision numbers _do_ change, pull, branch, and merge all introduce revision number differences between branches, (or changes within a branch in the case of pull). And there is no simple way to correlate the numbers between branches. Maybe you can argue that there isn't any centralization bias in bzr. But anyone that claims that the revnos. are stable really is talking from a standpoint that favors centralization. But, here's a unifying point about git and bzr. Git also allows branch-specific, unstable names for revisions. And they're even more unstable than the ones bzr generates. But there are some important differences between how they are used, (both by the tool and by people). To illustrate, yesterday I gave an example where performing a bzr branch from a dotted-decimal revision would rewrite the numbers from the originating branch (1.2.2, 1.2.1, and 1) to unrelated numbers in the new branch (3, 2, 1). I was surprised at first, and couldn't imagine any sane reason for the tool to go off and invent new names. It prevents a user of the new branch from referencing any commits by their original names. It also prevents the user from communicating with anyone with these new names, (unless the user publishes the branch, and any parties to the communication retain the new branch for as long as said communication might be reference). But then I realized why bzr is doing this. It's because, bzr users don't just use the revision numbers for external communication, but they also use them for lots of direct interaction with the tool. The rewriting makes it easy to write something like "bzr diff -r1..3". And it turns out that git also allows branch specific naming for the exact same reason. In place of 3, 2, 1 in the same situation git would allow the names HEAD, HEAD~1, and HEAD~2 to refer to the same three revisions. So the easy diff command would be "git diff HEAD~2 HEAD". (And where I have HEAD here I could also use any branch name, or any other reference to a commit as well.) So there are two fundamentally different uses for names, (and Linus recently talked about this in some length): 1. day-to-day working with the tool and 2. externally communicating about specific revisions. Both bzr and git allow for unstable, branch-specific names to be used as a convenience in the case of the day-to-day working. Maybe some of the people that dislike git's "ugly" names so much is that they imagine that to compare two revisions a user of git must inspect the logs, fish out the sha1sum for each, and then cut-and-paste to create the command needed. I agree that if that were required, it would be exceedingly painful. But that's not required, what the git user uses is branch names and simple variations. Now, there are some important difference in the unstable names that git and bzr has. Most importantly, git's are even less stable, (with respect to the association between a name and any specific revision). With every commit, all of the git names effectively shift as the branch moves, (HEAD points to the new commit, HEAD~1 points to what HEAD previously pointed to). This is remarkably useful since it provides stability in terms of what the user cares about, (the latest commit and it's closest ancestors). This means that "diff from grandparent to current commit" is always "git diff HEAD~1 HEAD" where as in bzr it is "git diff -r<X-2>..<X>" and the user actually does need to lookup X first, (unless there's more to the bzr revision specification than I've seen). Finally, since these branch-specific names are changing all the time, there's never any temptation for people to attempt to use them to for external communication. In contrast, by being numbered in the opposite direction, bzr revision numbers give a false appearance of stability and people _do_ use them for communication. This is the mistake we've been warning bzr users about in this thread. Also, since the git names are so predictable, git almost never emits them. It accepts them as names just fine, but it doesn't generate them, (log, and commit never show the branch-specific names). I think the only git command that even can emit such a name was a recently added git-name-rev which exists solely for the purpose of mapping a commit identifier to a local, branch-specific name which might have more intuitive meaning for the user. So the fact that things like git-log doesn't print these names also helps avoid any trap of users trying to communicate with something unstable. -Carl
Attachment:
pgppnDyKQ71iB.pgp
Description: PGP signature