On Thu, 19 Oct 2006 21:06:40 -0400, Aaron Bentley wrote: > I understand your argument now. Well, I'm glad to know we each feel like we are communicating at times, here. > It's nothing to do with numbers per se, > and all about per-branch namespaces. Correct? The entire discussion is about how to name things in a distributed system. The premise that Linus has put forth in a very compelling way, is that attempting to use sequential numbers for names in a distributed system will break down. The breakdown could be that the names are not stable, or that the system is used in a centralized way to avoid the instability of the names. Now, that causality might not accurately describe the way bzr has developed. It may be that the centralization bias was determined by other reasons, and that given those, using sequential numbers for names makes perfect sense. But it really is fundamental and unavoidable that sequential numbers don't work as names in a distributed version control system. > I meant that the active branch and a mirror of the abandoned branch > could be stored in the same repository, for ease of access. Granted, everything can be stored in one repository. But that still doesn't change what I was trying to say with my example. One of the repositories would "win" (the names it published during the fork would still be valid). And the other repository would "lose" (the names it published would be not valid anymore). Right? Now, maybe there's some "simple" mapping from old names to new names for the losing repository, (something like adding a prefix of "losers/" to the beginning of the names or something or adding a "15." prefix or whatever). The point is that the old names are invalidated. And there's no way to guarantee this kind of change won't happen in the future, (no matter how old a project is). I constructed that example to show that the naming has a social impact in forcing a distinction between winners and losers in the merge, (or mainline and side branch, or whatever you want to name the distinction). The two re-joining projects could be really amiable, create a new virgin mainline and treat both histories as side branches. In this version, everyone loses as all the old names are invalidated. > Bazaar encourages you to stick lots and lots of branches in your > repository. They don't even have to be related. For example, my repo > contains branches of bzr, bzrtools, Meld, and BazaarInspect. Git allows this just fine. And lots of branches belonging to a single project is definitely the common usage. It is not common (nor encouraged) for unrelated projects to share a repository, since a git clone will fetch every branch in the repository. common for a single base URL to provide a common basis for a hierarchy of git repositories, (see, for example http://repo.or.cz/), and that may provide similar benefits. I'm noticing another terminology conflict here. The notion of "branch" in bzr is obviously very different than in git. For example the bzr man page has a sentence beginning with "if there is already a branch at the location but it has no working tree". I'm still not sure exactly what a bzr branch is, but it's clearly something different from a git branch, (which is absolutely nothing more than a name referencing a particular commit object). [Note: after playing with it a bit more down below, a bzr "branch" appears to be something like a git "repository" that can only hold a single branch.] > I can see where you're coming from, but to me, the trade-off seems > worthwhile. Because historical data gets less and less valuable the > older it gets. By the time the URL for a branch goes dark, there's > unlikely to be any reason to refer to one of its revisions at all. I strongly disagree on this point. One, I don't think that the "time for a branch to go dark" is necessarily long, (or if it is, then that's another barrier that's setup against distributed development---people have to have a long-term repository before they can usefully start publishing a branch). Second, I'm not comfortable with any limit on usefulness of history. Would you willingly throw away commits, mailing list posts, or closed bug reports older than any given age for any projects that you care about? > When you create a new branch from scratch, the number starts at zero. > If you copy a branch, you copy its number, too. > > Every time you commit, the number is incremented. If you pull, your > numbers are adjusted to be identical to those of the branch you pulled from. > > Is that really complicated? OK. So now I had to actually try things out. I went ahead and installed bzr and was able to init and commit from the man page. I had to go to IRC to figure out how to create and change branches, (the documentation for "bzr branch" just said FROM_LOCATION and TO_LOCATION and I couldn't figure out what to pass for those). Here's the setup I came up with for a tweaked version of the a[bc]m diamond example I showed with git earlier, (I just added a second commit to each branch before merging): mkdir bzrtest; cd bzrtest mkdir master; cd master; bzr init touch a; bzr add a; bzr commit -m "Initial commit of a" cd .. bzr branch master b; cd b touch b; bzr add b; bzr commit -m "Commit b on b branch" echo "change" > b; bzr commit -m "Change b on b branch" cd .. bzr branch master c; cd c touch c; bzr add c; bzr commit -m "Commit c on c branch" echo "change" > c; bzr commit -m "Change c on c branch" cd ../master bzr merge ../b; bzr commit -m "Merge in b" bzr merge ../c; bzr commit -m "Merge in c" First, I've been told that this is a lot less efficient than possible since I have what in bzr terms is three unshared "branches" here, (what git would really call three separate "repositories"). Second, I think that using the filesystem for separating branches is a really bad idea. One, it intrudes on my branch namespace, (note that in many commands above I have to use things like "../b" where I'd like to just name my branch "b". Two, it prevents bzr from having any notion of "all branches" in places where git takes advantage of it, (such as git-clone and "gitk --all"). Three, it certainly encourages the storage problem I ran into above, (and I'd be interested to see a "corrected" version of the commands above to fix the storage inefficiencies). But anyway, those are all new topics, what we were trying to talk about is revision numbers. After the above commands I can run bzr log in my three branches, master, b, and c and I get the following revision number sequences: master: 1 2 3 b: 1 2 3 c: 1 2 3 And from this state if I ask questions with bzr missing and look at just the revision numbers, then the answers are useless. I get answers like: .../b:$ bzr missing ../c You have 2 extra revision(s): revno: 3 Change b on b branch revno: 2 Commit b on b branch You are missing 2 revision(s): revno: 3 Change c on c branch revno: 2 Commit c on c branch .../b:$ bzr missing ../master You are missing 2 revision(s): revno: 3 Merge in c revno: 2 Merge in b So there we have the revision numbers 2 and 3 each being used to name three different revisions. That's a lot of aliasing already. Then, if the b and c branches each treat master as their mainline and each pull, then both branches get their numbers all shuffled. Oh, drat. I just realized that I'm running 0.11 here which doesn't have the dotted-decimal numbers. (I'm trying to get bzr.dev too, but it appears to be stuck about 40% of the way through "Fetch phase 1/4" [Note: it ). In this version, the commits brought in as part of a merge don't get any "simple" number at all and instead "bzr log" shows a merge ID. I hadn't realized that the dotted decimal notation was so new that the community hadn't had a lot of experience with it yet. But, your description doesn't actually presume that notation. What you asked was: > When you create a new branch from scratch, the number starts at zero. > If you copy a branch, you copy its number, too. > > Every time you commit, the number is incremented. If you pull, your > numbers are adjusted to be identical to those of the branch you pulled from. > > Is that really complicated? And to answer. That description doesn't describe at all what happens to the "simple" numbers of commits that are merged. In the version I have, they disappear and get replaced with "ugly" numbers. In 0.12 something else happens instead, (that's the part I don't understand yet). And my argument isn't just "confusing" it's "confusing or useless". I understand that pull destroys numbers, and how, but that makes the numbers I had generated earlier useless. I still don't understand how people can avoid number changing, (since pull seems the only way to synch up without infinite new merge commits being added back and forth). So, yes, it really is complicated or my brain is just too small. > > The naming in git really is beautiful and beautifully simple. > > Well, you've got to admit that those names are at least superficially ugly. Sure. But I'll gladly take a simple system with superficial warts than a complex system with superficial beauty. > What's nice is being able see the revno 753 and knowing that "diff -r > 752..753" will show the changes it introduced. Checking the revo on a > branch mirror and knowing how out-of-date it is. With git I get to see a revision number of b62710d4 and know that "diff b62710d4^ b62710d4" will show its changes, though much more likely just "show b62710d4". I really cannot fathom a place where arithmetic on revision numbers does something useful that git revision specifications don't do just as easily. Anybody have an example for me? -Carl PS. The "bzr branch" of bzr.dev did eventually finish. I can see the dotted-decimal numbers in my example now, (1.1.1 and 1.2.2 for the commits that came from branch b; 1.2.1 and 1.2.2 for the commits that came from branch c). At 5 characters a piece these are well on their way to getting just as "ugly" as git names, (once it's all cut-and-paste the difference in ugliness is negligible). And now, I see it's not just pull that does number rewriting. If I use the following command (after the chunk of commands above): cd ..; bzr branch -r 1.2.2 master 1.2.2 It appears to just create newly linearized revision numbers from whole cloth for the new branch (1, 2, and 3 corresponding to mainline 1, 1.2.1, and 1.2.2). That's totally surprising, very confusing, and would invalidate any use I wanted to make of published revision numbers for the mainline branch while I was working on this branch. See? This stuff really doesn't work. Motivating scenario for the above: Imagine 1.2.3 commited garbage so I want to fix it by branching from 1.2.2 rather than the mainline "2". Then after I branch, I learn something about "1.2.1" that I want to investigate more closely. I try to inspect that in my branch, but ouch! I don't have that revision. Is there even a way to say "show me the change introduced by what is named '1.2.1' in the source branch in this scenario" ? Note: In #bzr I just learned that there is a way for me to do this _if_ I also happen to have a pull of the original branch somewhere on my machine. Something like: bzr diff -r1.2.0:../master -r1.2.1:../master I don't know if there's a way to get diff's .. notation to work with that, (I can't manage to). But these simple numbers are getting less simple all the time. With git, if I find a revision number somewhere, I can cut-and-paste it and get the right thing: git show b62710d4f8602203d848daf2d444865b611fff09 But with bzr if I find "1.2.1" somewhere I'm likely to type: bzr diff -r1.2.0..1.2.1 If I'm lucky, then that fails with: bzr: ERROR: Requested revision: '1.2.0' does not exist in branch: and I go back to the source, find out what branch it was referring to, remember where that is on my machine (../master, say), and manually type that to my command line to get: bzr diff -r1.2.0:../master -r1.2.1:../master If I'm unlucky then the first diff comes up with some unrelated commit and I get to be confused before I go through that same process. Now do you see? It really, really does not work. This stuff is about as un-simple as could be, and this things will happen.
Attachment:
pgp13ZS3fXl0v.pgp
Description: PGP signature