On Thu, 19 Oct 2006 10:58:48 -0400, Aaron Bentley wrote: > >> In bzr development, it's very rare for anyone's revision numbers to change. > > > > Which just says to me that the bzr developers really are sticking to a > > centralized model. > > I don't see why you're reaching that conclusion. I'd like to understand > that better, because Linus seems to be concluding the same thing, and it > doesn't make sense to me. First, I want to point out that I think we're having a delightfully enlightening conversation here, and I'm glad for that. Let me provide a couple of hypothetical situations to try to demonstrate my thinking here. The first is far-fetched but perhaps easier to understand the implications. But the second is the real, everyday situation that is much more important. Far-fetched ----------- Let's imagine there's a complete fork in the bzr codebase tomorrow. We need not suppose any acrimony, just an amiable split as two subsets of the team start taking the code in different directions. Now, at the time of the fork, all published revision numbers apply equally well to either team's codebase, (obviously, since they are identical). But as the projects diverge they each start publishing revision numbers with respect to their own repositories in their own bug trackers, etc. Obviously, each project has its own "mainline" so these new revision numbers are only unique within each project and not between the two. Time passes... Finally the two teams (who had remained good friends after the breakup) find a unifying theory that will let them work on a single tool that will meet the needs of both user bases. So they want to merge their code together. After the merge, there can be only one mainline, so one team or the other will have to concede to give up the numbers they had generated and published during the fork. That is, the numbers will not be usable within the new, merged repository. Everyday -------- Now, the above scenario is just silly. It's not likely to ever happen, so it's really not worth considering as a motivating case. But, what does (and should) happen everyday is exactly the same. So here's a realistic situation that is worth considering: An individual takes the bzr codebase and starts working on it. It's experimental stuff, so it's not pushed back into the central repository yet. But our coder isn't a total recluse, so his friends help him with the code he's working on. They communicate about their work, (perhaps on the main bzr mailing list), and make statements such as "feature F is working perfectly as of version V". But for these communications, revision numbers will not provide historically stable values that can be used. It's impossible for our coder to predict the numbers that will be assigned to his code when they get merged back into the mainline---since some other unknown programmer may have branched at exactly the same point and is trying to make the same determination. Neither programmer can know which code will land first, so neither can know what numbers will get assigned, right? Now, the programmers could get stable numbers by keeping the branch in the main tree, or by at least pushing out the branching point to "reserve" a number in the main tree. So, the only way to get stable numbers is to rely on this central tree. Does that make sense? > That doesn't follow. Just because something is arguably true doesn't > make it bad. And in this case, I'm not arguing that it's true, I'm > saying that it's true, because that is what my experience tells me is true. [I'm sorry, but I didn't grasp this sentence. I think I lost the antecedent of "it" somewhere.] > > In cairo, for example, we've made a habit of including a revision > > identifier in our bug tracking system for every commit that resolves a > > bug. > > We do it the other way around: we put a bug number in the commit > message. Oh, we do that too. That number is important, (for "what the heck is this commit trying to do, and why", since (sadly) much of the why ends up getting stuck off in external bug tracking tools). But the reverse direction is also important, ("Hey, this bug got fixed in the development version, but I want to backport it to my distribution package. Where can I find it?"). > And I personally have been developing a bugtracker that is > distributed in the same way bzr is; it stores bug data in the source > tree of a project, so that bug activities follow branches around. That kind of thing sounds very useful. As I've been talking about "numbers" here in bug trackers and mailing lists, it should be obvious that I consider the information stored in such systems an important part of the history of a code project. So it would be nice if all of that history were stored in an equally reliable system in some way. > On the other hand, I think your revision identifiers are not as > permanent as you think. > > In the first place, it seems fairly common in the Git community to > rebase. This process throws away old revisions and creates new > revisions that are morally equivalent[1]. Yes, rebasing does "destroy history" in one sense, (in actual fact, it creates new commits and leaves the old ones around, which may or may not have references to them anymore). But i's definitely not common for git users to use rebase in a situation where it would change any published number. For example, I regularly use git-rebase, (and similar "git-commit --amend"), as I'm putting together a new branch that exists only in a repository on my laptop with nobody having external visibility to it. So, if I see a typo in a commit and I've never pushed it anywhere, I'll just "git commit --amend" to fix it. But if I see that typo only after I push out the change, then I just make a new commit to fix it, (and suck up the fact that my mistake will be a permanent part of the history). And git helps with this as well. If I ever forget that I've already pushed a change and then I rebase, then the next time I try to push, git will complain that I'm attempting to throw away history on the remote end, and will refuse to cooperate, (unless I force it). There's a similar safety mechanism on the pull side. If I did force a history-rewriting push, then users who tried to pull it would also have to force git's hand before it would rewrite their history. [By the way, it is sometimes useful to make chaotic, regularly-rebased branches visible to others, so they can watch what's going on. (Junio does this with his "proposed updates (pu)" branch in hit repository for git itself, for example). It's just that such branches should never be used to start new development if they expect to pull from the branch again later, nor should the revision numbers of such a branch ever be considered permanent, nor published anywhere.] > In the second place, one must consider the "nuclear launch codes" > scenario. Sure. And git does provide tools that can do this. Of course, the "normal" tools strictly add new commits and move branches (which are no more than references to commits) around. But moving branches can leave commits unreferenced. And a "prune" command does exist, (which isn't needed in "normal" use), which will delete unreferenced objects. -Carl > [1] This is a process that I find discomforting, because I consider the > original revisions to be real, historical data, and I don't like the > idea of throwing it away. As I mentioned above. They aren't thrown away. I often use rebase when re-building an ugly series of patches into a nice clean set of patches. And in that situation, I might rebase from the old to the new, but still with a reference to the old branch until I'm done with the entire process. And it's perfectly possible, and legitimate that such a reference has been published and the old branch will live "forever" even if I rebased it. So rebase isn't necessarily destructive.
Attachment:
pgpfjBQ17IpsO.pgp
Description: PGP signature