On Fri, 5 May 2006, Petr Baudis wrote: > > It's a philosophical question here, but I'd say that Git is much closer > to Monotone than to any other version control system Some historical background.. Before I dropped BK, I ended up being involved in trying to get Larry and Tridge to come to some agreement about how to solve the issues Tridge had with BK not being open-source. That actually went on for maybe two months or so, and I kept on hoping that we'd find some acceptably middle ground. I thought we could find somethign that would actually work for everybody: to hopefully both make BK technically better, _and_ to make the end result more palatable to the "free software or bust" contingency. One of the suggestions that I tried to push as an acceptable middle ground was to make a "generic" BK repository export format, so that people who didn't want to use BK could still get all the information, and not in a broken format like CVS (yes, CVS makes sense as an interchange format, since _everybody_ speaks CVS, but it's a horrible, horrible, horrible format from any technical standpoint). My example export format was really a strange mixture of patches with parenthood information, where the history information was described with hashes (MD5 rather than SHA1, but that was just an implementation thing, and mostly because BK used MD5 sums). Not something really useful as a real SCM, but it wasn't designed for that - it was just meant to be a useful and unambiguous interoperability format. Now, that didn't work out, and I was a little bummed. I thought it would have made both sides happy, because it would actually have been a better format than CVS (and yes, I'm somewhat biased: in my opinion, having a million monkeys throwing crap at the walls and encoding the information in the patterns on monkey shit is a better format than CVS), so it would actually have improved BK, while also making it possible to interoperate if you didn't want to use BK itself. But Tridge didn't believe that it would actually have exported all the information in a BK tree, even if both I and Larry told him it would. I'm not a hundred percent sure that Larry would have gone for the export format either, but hey, one sign of a good compromise is that neither side really gets what they really want. Whatever. It didn't work. So it didn't actually resolve the deadlock, but when it became clear that I couldn't work with BK any more, I thought I might use something like that "patch + parenthood" representation as a way to maintain my tree while looking at other alternatives. So in many ways, when I started looking around for distributed SCM's, I came into the game with the background of keeping the history around as chains of hashes describing it, and then just having patches to describe the differences between versions. So that was really my "fallback" position: if nothing out there worked, I'd rather go back to lists of patches than use CVS. Now, if you keep track of just patches, one of the issues is that you can't afford to re-create the tree every time by walking patches forward from the beginning, so I also was planning to have an "cache" that maintained the current state of the tree as a separate state from the working tree, so that I would always have the "working tree" and the "result of patches up to this moment" as two separate things (so that I could do the "bk diff" that I was used to doing to see the difference between my last state and the current state of the working tree). In other words, I was already working on the git "index" file. And I was planning to just have a patch-based system behind it, with a hashed history. Kind of "quilt with history and an index to speed things up". The index itself would be backed-up with whole files (all hidden in the ".dircache" directory), and the patch series would thus normally never actually be _used_. So the inefficiency of working with patches would never be much of an issue. A "commit" would create a new patch from the current working directory and the previous shadow tree, and update the shadow tree and add a new entry to the history list. And then I found Monotone. Now, monotone was slow. Monotone was so _horrendously_ slow that I had to do special hacks just to import _one_ version of Linux into it in less than two hours. It was something stupid like an O(N**3) algorithm in the number of filenames (and the kernel had 17,291 files at that time: v2.6.12-rc2), and it was just totally unusable for me. I also thought (and still think) that the whole signing thing was a waste of time and misdesigned, and I obviously am not a huge fan of databases. So in many ways I disliked the monotone implementation decisions (and some of its design decisions). But at the same time, I immediately liked the SHA1 object naming concept of Monotone. It also already matched how I had conceptually planned on doing on the history anyway, and had some ideas for, but it took that whole "history hashing" all the way. And thus git was born. So git really has three parents. In a very real sense, BK (or, perhaps more appropriately - the way I personally used BK, which is not necessarily how others have used it) was the biggest thing from the standpoint of what I wanted my _workflow_ to be like. It was simply how I had done things for the last few years, so a lot of my mental model for how things are supposed to _work_ came from BK. I still don't think people give Larry enough credit for actually pushing this whole distributed SCM thing as a _usable_ model. Very few of the open-source distributed SCM's are actually usable even today, and as far as I've been able to gather, the commercial ones aren't really any closer either. Larry didn't have the kind of examples of what _can_ work that I had. The other parent was the stupid "series of patches" model, which was what really resulted in the "index" thing. I realize that people don't always much like the index, but it's really a pretty central part of git history, and one of the distinguising marks of git. It may be trivial, and to some degree it's been overshadowed by all the tree operations we do (the combination of revision walking and tree diffing), but it was very central to how git came to be. The index also ended up being central to how we did merges - even if some day we may end up doing more of that on a pure tree level (ie the current git-merge-tree model), I think the way we ended up doing merges owes a lot to the index as a staging area. (Historically, the "index" was called the "cache". Exactly because it came from the notion of "caching" the top commit state in a patch series, and then working with patches either backwards or forwards from that top cached state. Similarly, we didn't have a ".git" directory: it was called ".dircache", exactly because it was all about caching the state of the previous commit directory layout). And finally, Monotone for the "everything is an object named by its SHA1" model, which to some degree is perhaps the central - or at least the most obvious - part of git. It largely was designed really just to be the "backing store" for the "cache", and to not be _that_ important. That also explains why I didn't worry too much about disk usage etc initially: the object store wasn't even the most important part, and I envisioned just moving old objects that weren't needed into some "backup storage" kind of thing. Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html