Hi Dscho, On Sun, Jun 17, 2018 at 2:44 PM, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > I was really referring to speed. But I have to admit that I do not have > any current numbers. > > Another issue just hit me, though: rebase --am does not need to look at as > many Git objects as rebase --merge or rebase -i. Therefore, GVFS users > will still want to use --am wherever possible, to avoid "hydrating" > many objects during their rebase. What is it that makes rebase --am need fewer Git objects than rebase --merge or rebase -i? I have one idea which isn't intrinsic to the algorithm, so I'm curious if there's something else I'm unaware of. My guess at what objects are needed by each type: At a high level, rebase --am for each commit will need to compare the commit to its parent to generate a diff (which thus involves walking over the objects in both the commit and its parent, though it should be able to skip over subtrees that are equal), and then will need to look at all the objects in the target commit on which it needs to apply the patch (in order to properly fill the index for a starting point, and used later when creating a new commit). If the application of the diff fails, it falls back to a three-way merge, though the three-way merge shouldn't need any additional objects. So, to summarize, rebase--am needs objects from the commit being rebased, its parent, and the target commit onto which it is applying, though it can short circuit some objects when the commit and its parent have matching subtree(s). rebase -i, if I understand correctly, does a three-way merge between the commit, its parent, and the target commit. Thus, we again walk over objects in those three commits; I think unpack_trees() does not take advantage of matching trees to avoid descending into subtrees, but if so that's an optimization that we may be able to implement (though it would require diving into unpack_trees() code, which is never easy...). (Side notes: (1) rebase --merge is basically the same as rebase -i here; it's path to reaching the recursive merge machinery is a bit different but the resulting arguments are the same; (2) a real merge between branches would require more objects because it would have to do some revision walking to find a merge base, and a real merge base is likely to differ more than just the parent commit. But finding merge bases isn't relevant to rebase -m or rebase -i) Is there something else I'm missing that fundamentally makes rebase -i need more objects? > As to speed: that might be harder. But then, the performance might already > be good enough. I do not have numbers (nor the time to generate them) to > back up my hunch that --am is substantially faster than --merge. I too have a hunch that --am is faster than --merge, on big enough repos or repos with enough renames. I can partially back it up with an indirect number: at [1], it was reported that cherry-picks could be sped up by a factor of 20-30 on some repos with lots of renames. I believe there are other performance improvements possible too, for the --merge or -i cases. I'm also curious now whether your comment on hydrating objects might uncover additional areas where performance improvements could be made for non-am-based rebases of large-enough repos. Elijah [1] https://public-inbox.org/git/CABPp-BH4LLzeJjE5cvwWQJ8xTj3m9oC-41Tu8BM8c7R0gQTjWw@xxxxxxxxxxxxxx/ (see also Peter's last reply in that thread, and compare to his first post)