On Thu, 19 Oct 2006, Matthieu Moy wrote: > > By curiosity, how would you compare git and Bitkeeper, on a purely > technical basis? (not asking for a detailed comparison, but an "X is > globaly/much/terribly/not better than Y" kind of statement ;-) ) I think git is better for kernel work these days, but a large portion of that is that a lot of the features have literally been tweaked for us (for very obvious reasons). For example, the whole "rebase" thing (or explicitly making cherry-picking easy) is something that a number of kernel people do, and even if I have to admit to not liking the practice very much (it kind of hides the "true" development history), it does have huge advantages, and it makes history a lot easier to read. Similarly, I often used the single-file graphical history viewing in BK ("revtool"), but being able to follow the history of multiple files as one "entity" really is something that once you get used to, it's really really hard going back, and "gitk" does generate a much more readable graph. And I think the git way of doing branches is just simply superior. Git always did branches in the sense that the way merges happened you _always_ had several heads, but actually making them available and switching between them was something that wasn't my idea, and that I even was a bit apprehensive about. I was wrong. Git branches are branches done right. I just don't see how you _could_ do them better. That said, a lot of the features I like and _I_ consider really important are possibly not that important to others. For example, maybe nobody else really cares about viewing the history of a particular subsystem, the way I do. For a lot of people, single-file is probably ok. For example, while git now does "annotate" (or "blame"), it's not lightning fast, and I simply don't care. Doing a git blame kernel/sched.c takes about three seconds for me, and that's on a pretty good machine (and on the kernel tree, which for me is always in the cache ;). Quite frankly, if I cared deeply about that kind of annotation, I'd probably be upset about it. There are basically _no_ other git operations that take that long. I can get the _full_ log of the last 18 months of the kernel much faster than that. And the slowness of annotate comes directly from the design of git, and from the fact that it's not how I tend to look at changes. Rather than doing "git blame kernel/sched.c", I'm _much_ more likely to just do git log -p kernel/sched.c and see the changes as individual patches instead (and perhaps search for some pattern that I'm looking for by just literally using a regex in the pager). Also, the fact that you need to repack the archive every once in a while doesn't disturb me. I probably end up repacking the kernel almost daily, which is _waay_ excessive, but it's just become habit of mine. I've seen people who really don't like it, and I've also seen people who apparently never even realized that they should do an occasional "git repack -a -d", and then they have hundreds of thousands of loose objects and wonder why the performance is so bad ;) BK never had these issues. BK always kept things "packed", which made a lot of operations much slower ("bk undo" was painfully slow). BK could annotate quickly, since it was really a file-based history, in a way that git fundamentally isn't, and can never be (and I don't _want_ it to be, but it means that "annotate" is slow). And BK had some great tools. The merge tool was superior ("bk resolve"? I forget). The patch-application tool was great. But both of those tools are things that git doesn't have, for _another_ reason: the way git works, you don't really need them. For example, the patch application tool was great, but the biggest reason it was needed in the first place was tracking renames explicitly. In that kind of environment, you have serious problems with patches, and you actually _need_ a tool to let the user explain when something is a rename and when it isn't. With git not tracking renames, the patch application tool simply isn't needed. The same goes to some degree to "bk resolve". Because git has the index, and you can _leave_ things unresolved in the index, you don't need a graphical tool to resolve things - git knows very fundamentally about incomplete merges _and_ about multiple branches (which you need in order to keep track of both the branch you merge from and the branch you merge into), and it's fine to resolve any conflicts in the normal working tree. So for at least _my_ usage, git does everything very well, but that's because if it didn't fit me, I fixed it until it did. And "git bisect" really does rock. I still cannot believe that apparently nobody did it before us. It's such a useful thing, and it works so well in unambiguous cases (and not all cases are that unambiguous, but an appreciably large subset is). So that said, git does work very well for us, but I do want to end on a note on thigns that BitKeeper did and nobody else has: - Larry was first. The undeniable fact is, that before BK (and for several years _after_ BK), the open-source alternatives were just CRAP. You can say anything you like about his personality, but dammit, compared to Larry, most people I know are idiots. People don't give BK the credit it deserves. When Tridge "reverse-engineered" it, people were making jokes about how trivial some of the protocols were. That misses the point ENTIRELY. The point is, compared to BK, everything else absolutely _sucked_, and BK really was a watershed program. Never EVER underestimate how important BK was. Quite frankly, I think most open-source SCM's _still_ suck. I'm constantly amazed that anybody would touch SVN with a ten-foot pole. Talk about crap. And SVN is at least usable, unlike a lot of other projects. - When I did git, one of the things that actually _helped_ me was that I was consciously trying to not do a BK clone. I wanted to do the same things that BK did, but I very much did _not_ want to do them the _way_ BK did them. I respect Larry too much, and I didn't want there to be any question about git being just a "clone". So a lot of the git design ended up very much trying to avoid old designs on purpose, and I think that really helped. The fact that I didn't have a background in SCM's, and that I thought all the weaves etc were confusing, meant that I instead went for a radically different way of doing things. And I'm 100% convinced that "radically different" was the right thing to do. That was what allowed git to really soar. A lot of the good things in git come exactly from the fact that git does _not_ do things like most traditional SCM's do. But BK should still get a lot of credit, because it was what taught me (and a lot of other people) what being "distributed" really meant. - On a more personal note: people say that BK showed the "failure" of using a commercial closed-source program. I would disagree. Not only did the kernel get a whole lot of useful work out of BK, we learnt how distributed systems _should_ work, and quite frankly, I'd do ít all over again in a heartbeat. If there was a "failure" in the BK saga, it was in how horrendously _bad_ all open-source SCM's were, even with BK showing how it should have been done for several years. THAT is the failure. The fact that there were hundreds of people who whined about BK, and nobody really did anything productive. Now, I'm obviously biased, but I really do believe that git is the best open-source SCM there is, by a _mile_. I don't know how many people realize this, but we literally haven't changed our data formats in over a year. I was looking at my old git import of the BKCVS tree today, because I wanted to look up the "BKrev" format for the email earlier in this tree, and I realized that the pack-file was from July of last year. That's within a few _weeks_ of the pack-file being introduced at all, and guess what? It all still worked. No "on-the-fly format conversion", no _nothing_. It just worked. That should tell people something. It's pretty much the fastest SCM out there (and yeah, that's on almost any operation you can name), it still has the smallest disk footprint I've ever heard of, and it hasn't had the "format of the week" disease that every other project seems to go through. And it's used in production settings on some of the biggest projects out there. SVN has more users, but let's face it, SVN really isn't even in the running. Technology-wise, the thing is just not worth bothering with, but it's a good crutch for people who are used to CVS and never want to use anything lse. Am I happy with git? I'm happy as a clam. It turned out even better than I ever thought it would. And BK was what taught me what to aim for. Linus