On Wed, 23 Jul 2008, Ingo Molnar wrote: > > I've got the following, possibly stupid question: is there a way to > merge a healthy number of topic branches into the master branch in a > quicker way, when most of the branches are already merged up? > > Right now i've got something like this scripted up: > > for B in $(git-branch | cut -c3- ); do git-merge $B; done > > It takes a lot of time to run on even a 3.45GHz box: > > real 0m53.228s > user 0m41.134s > sys 0m11.405s This is almost certainly because a lot of your branches are a long way back in the history, and just parsing the commit history is old. For example, doing a no-op merge of something old like v2.6.24 (which is obviously already merged) takes half a second for me: [torvalds@woody linux]$ time git merge v2.6.24 Already up-to-date. real 0m0.546s user 0m0.488s sys 0m0.008s and it gets worse the further back in history you go (going back to 2.6.14 takes a second and a half - plus any IO needed, of course). And just about _all_ of it is literally just unpacking the commits as you start going backwards from the current point, eg: [torvalds@woody linux]$ time ~/git/git merge v2.6.14 Already up-to-date. real 0m1.540s vs [torvalds@woody linux]$ time git rev-list ..v2.6.14 real 0m1.407s (The merge loop isn't quite as optimized as the regular revision traversal, so you see it being slower, but you can still see that it's roughly in the same class). The merge gets a bit more expensive still if you have enabled merge summaries (because now it traverses the lists twice - once for merge bases, once for logs), but that's still a secondary effect (ie it adds another 10% or so to the cost, but the base cost is still very much about the parsing of the commits). In fact, the two top entries in a profile look roughly like: 102161 70.2727 libz.so.1.2.3 libz.so.1.2.3 (no symbols) 7685 5.2862 git git find_pack_entry_one ... ie 70% of the time is just purely unpacking the data, and another 5% is just finding it. We could perhaps improve on it, but not a whole lot. Now, quite frankly, I don't think that times on the order of one second are worth worrying about for _regular_ merges, and the whole (and only) reason you see this as a performance problem is that you're basically automating it over a ton of branches, with most of them being old and already merged. But that also points to a solution: instead of trying to merge them one at a time, and doing the costly revision traversal over and over and over again, do the costly thing _once_, and then you can just filter out the branches that aren't interesting. So instead of doing for B in $(git-branch | cut -c3- ); do git-merge $B; done the obvious optimization is to add "--no-merged" to the "git branch" call. That itself is expensive (ie doing "git branch --no-merged" will have to traverse at least as far back as the oldest branch), so that phase will be AT LEAST as expensive as one of the merges (and probably quite a bit more: I suspect "--no-merged" isn't very heavily optimized), but if a lot of your branches are already fully merged, it will do all that work _once_, and then avoid it for the merges themselves. So the _trivial_ solution is to just change it to for B in $(git branch --no-merged | cut -c3- ); do git-merge $B; done and that may already fix it in practice for you, bringing the cost down by a factor of two or more, depending on the exact pattern (of course, it could also make the cost go _up_ - if it turns out that none of the branches are merged). Other solutions exist, but they get much uglier. Octopus merges are more efficient, for example, for all the same reasons - it keeps the commit traversal in a single process, and thus avoids having to re-parse the whole history down to the common base. But they have other problems, of course. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html