Re: [RFC PATCH 3/3] git-rebase.sh: make git-rebase--interactive the default

Elijah Newren <newren@xxxxxxxxx> · Wed, 20 Jun 2018 09:27:54 -0700

Hi Dscho,

On Sun, Jun 17, 2018 at 2:44 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:

> I was really referring to speed. But I have to admit that I do not have
> any current numbers.
>
> Another issue just hit me, though: rebase --am does not need to look at as
> many Git objects as rebase --merge or rebase -i. Therefore, GVFS users
> will still want to use --am wherever possible, to avoid "hydrating"
> many objects during their rebase.

What is it that makes rebase --am need fewer Git objects than rebase
--merge or rebase -i?  I have one idea which isn't intrinsic to the
algorithm, so I'm curious if there's something else I'm unaware of.

My guess at what objects are needed by each type:

At a high level, rebase --am for each commit will need to compare the
commit to its parent to generate a diff (which thus involves walking
over the objects in both the commit and its parent, though it should
be able to skip over subtrees that are equal), and then will need to
look at all the objects in the target commit on which it needs to
apply the patch (in order to properly fill the index for a starting
point, and used later when creating a new commit).  If the application
of the diff fails, it falls back to a three-way merge, though the
three-way merge shouldn't need any additional objects.  So, to
summarize, rebase--am needs objects from the commit being rebased, its
parent, and the target commit onto which it is applying, though it can
short circuit some objects when the commit and its parent have
matching subtree(s).

rebase -i, if I understand correctly, does a three-way merge between
the commit, its parent, and the target commit.  Thus, we again walk
over objects in those three commits; I think unpack_trees() does not
take advantage of matching trees to avoid descending into subtrees,
but if so that's an optimization that we may be able to implement
(though it would require diving into unpack_trees() code, which is
never easy...).

(Side notes: (1) rebase --merge is basically the same as rebase -i
here; it's path to reaching the recursive merge machinery is a bit
different but the resulting arguments are the same; (2) a real merge
between branches would require more objects because it would have to
do some revision walking to find a merge base, and a real merge base
is likely to differ more than just the parent commit.  But finding
merge bases isn't relevant to rebase -m or rebase -i)

Is there something else I'm missing that fundamentally makes rebase -i
need more objects?

> As to speed: that might be harder. But then, the performance might already
> be good enough. I do not have numbers (nor the time to generate them) to
> back up my hunch that --am is substantially faster than --merge.

I too have a hunch that --am is faster than --merge, on big enough
repos or repos with enough renames.  I can partially back it up with
an indirect number: at [1], it was reported that cherry-picks could be
sped up by a factor of 20-30 on some repos with lots of renames.  I
believe there are other performance improvements possible too, for the
--merge or -i cases.

I'm also curious now whether your comment on hydrating objects might
uncover additional areas where performance improvements could be made
for non-am-based rebases of large-enough repos.

Elijah

[1] https://public-inbox.org/git/CABPp-BH4LLzeJjE5cvwWQJ8xTj3m9oC-41Tu8BM8c7R0gQTjWw@xxxxxxxxxxxxxx/
(see also Peter's last reply in that thread, and compare to his first
post)