"Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: > Junio C Hamano <junkio@xxxxxxx> wrote: >> * git-rebase with -m is dog slow. There were people who >> advocated to make it the default, but they probably are >> either working in a very small project, or working on a >> filesystem that even git-apply is slow that the speed >> difference does not matter to them. > ... > But that's not the situation everyone else has, so its reasonable > that -m ain't the default. ;-) Well, that is not the conclusion you should be drawing from this. If rebase -m is 10x slower than without -m in cases where the rename handling does not matter, there is something wrong. And what is wrong in this case is that the unpack-trees tree merging code, which is used everywhere in git to do branch switching and merges, is way too inefficient. When merge-recursive is instructed to merge another tree with the current tree using an ancestor, while taking the index into account, it basically does the three-way tree-level merge one path at a time, even when subdirectory at quite high level matches identically across three trees. The situation is the same for switching branches. If two branches of the kernel project (22k files spread across 1300 directories) differ at a file at the toplevel (e.g. v2.6.21 which changes only Makefile), we still read the index, the current tree, and the other branch, and match all 22k files one by one to compute the resulting index entry, by first removing the current index entry and then stuffing the result entry in the index, all the while trashing the cache-tree. Then we recompute all 1300 tree objects and write them out, even though we should be able to notice that none of the toplevel 17 subdirectories have changed, and all we have to do is to rehash one blob and recompute only one tree object at the toplevel. We boast how lightweight git branches are and how fast switching between two branches is, but that's a serious lie. If done properly, we should be able to switch branches in a time roughly proportional to the number of paths different between the branches. Currently, the time is proportional to the size of the tree, no matter how small the change between trees are. git-apply, which is used by rebase without -m, is optimized to make it proportional to the size of the change. It obviously knows to only touch the affected paths (because the patch does not talk about unaffected paths) and leave the others intact, but also avoids expensive tree recomputation for unaffected directories, by properly maintaining the cache-tree data in the index. IIRC, Linus said unpack-trees was beyond repair several months ago, and I tend to agree with him. Currently the first thing unpack-trees does is to discard cache-tree from the index, because the code does not properly invalidate affected paths, and it is probably way too cumbersome to add it to various places the code modifies the index (I haven't looked at it recently, so maybe somebody can try it and prove me wrong). My gut feeling is that we may be better off redoing the tree level merge infrastructure from scratch, and make a new one that is optimized for trees with small differences. There is a prototype code called test-para in 'pu' that implements such a multi-tree walk, and also we've had its precursor (by Linus) called git-merge-tree in 'master' for quite a long time, but unfortunately neither has recently seen any activity. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html