Re: Comparing rebase --am with --interactive via p3400

Elijah Newren <newren@xxxxxxxxx> · Fri, 27 Dec 2019 14:45:55 -0800

Hi Alban,

On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@xxxxxxxxx> wrote:
>
> Hi Johannes & Elijah,
>
> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit :
> > Hi Elijah,
> >
> > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the
> > --am backend) and then with --keep-empty to force the interactive backend
> > to be used. Here are the best of 10, on my relatively powerful Windows 10
> > laptop, with current `master`.
> >
> > With regular rebase --am:
> >
> > 3400.2: rebase on top of a lot of unrelated changes             5.32(0.06+0.15)
> > 3400.4: rebase a lot of unrelated changes without split-index   33.08(0.04+0.18)
> > 3400.6: rebase a lot of unrelated changes with split-index      30.29(0.03+0.18)
> >
> > with --keep-empty to force the interactive backend:
> >
> > 3400.2: rebase on top of a lot of unrelated changes             3.92(0.03+0.18)
> > 3400.4: rebase a lot of unrelated changes without split-index   33.92(0.03+0.22)
> > 3400.6: rebase a lot of unrelated changes with split-index      38.82(0.03+0.16)
> >
> > I then changed it to -m to test the current scripted version, trying to
> > let it run overnight, but my laptop eventually went to sleep and the tests
> > were not even done. I'll let them continue and report back.
> >
> > My conclusion after seeing these numbers is: the interactive rebase is
> > really close to the performance of the --am backend. So to me, it makes a
> > total lot of sense to switch --merge over to it, and to make --merge the
> > default. We still should investigate why the split-index performance is so
> > significantly worse, though.
> >
> > Ciao,
> > Dscho
> >
>
> I investigated a bit on this.  From a quick glance at a callgrind trace,
> I can see that ce_write_entry() is called 20 601[1] times with `git am',
> but 739 802 times with the sequencer when the split-index is enabled.

Sweet, thanks for digging in and analyzing this.

> For reference, here are the timings, measured on my Linux machine, on a
> tmpfs, with git.git as the repo:
>
> `rebase --am':
> > 3400.2: rebase on top of a lot of unrelated changes             0.29(0.24+0.03)
> > 3400.4: rebase a lot of unrelated changes without split-index   6.77(6.51+0.22)
> > 3400.6: rebase a lot of unrelated changes with split-index      4.43(4.29+0.13)
> `rebase --quiet':

--quiet?  Isn't that flag supposed to work with both backends and not
imply either one?  We previously used --keep-empty, though there's a
chance that flag means we're not doing a fair comparison (since 'am'
will drop empty commits and thus have less work to do).  Is there any
chance you actually ran a different command, but when you went to
summarize just typed the wrong flag name?  Anyway, the best would
probably be to use --merge here (at the time Johannes and I were
testing, that wouldn't have triggered the sequencer, but it does now),
after first applying the en/rebase-backend series just to make sure
we're doing an apples to apples comparison.  However, I suspect that
empty commits probably weren't much of a factor and you did find some
interesting things...

> > 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.02)
> > 3400.4: rebase a lot of unrelated changes without split-index   5.60(5.32+0.27)
> > 3400.6: rebase a lot of unrelated changes with split-index      5.67(5.40+0.26)
>
> This comes from two things:
>
> 1. There is not enough shared entries in the index with the sequencer.
>
> do_write_index() is called only by do_write_locked_index() with `--am',
> but is also called by write_shared_index() with the sequencer once for
> every other commit.  As the latter is only called by
> write_locked_index(), which means that too_many_not_shared_entries()
> returns true for the sequencer, but never for `--am'.
>
> Removing the call to discard_index() in do_pick_commit() (as in the
> first attached patch) solve this particular issue, but this would
> require a more thorough analysis to see if it is actually safe to do.

I'm actually surprised the sequencer would call discard_index(); I
would have thought it would have relied on merge_recursive() to do the
necessary index changes and updates other than writing the new index
out.  But I'm not quite as familar with the sequencer so perhaps
there's some reason I'm unaware of.  (Any chance this is a left-over
from when sequencer invoked external scripts to do the work, and thus
the index was updated in another processes' memory and on disk, and it
had to discard and re-read to get its own process updated?)

> After this, ce_write() is still called much more by the sequencer.
>
> Here are the results of `rebase --quiet' without discarding the index:
>
> > 3400.2: rebase on top of a lot of unrelated changes             0.23(0.19+0.04)
> > 3400.4: rebase a lot of unrelated changes without split-index   5.14(4.95+0.18)
> > 3400.6: rebase a lot of unrelated changes with split-index      5.02(4.87+0.15)
> The performance of the rebase is better in the two cases.

Nice.  :-)

> 2. The base index is dropped by unpack_trees_start() and unpack_trees().
>
> Now, write_shared_index() is no longer called and write_locked_index()
> is less expensive than before according to callgrind.  But
> ce_write_entry() is still called 749 302 times (which is even more than
> before.)
>
> The only place where ce_write_entry() is called is in a loop in
> do_write_index().  The number of iterations is dictated by the size of
> the cache, and there is a trace2 probe dumping this value.
>
> For `--am', the value goes like this: 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4,
> 4, 4, 5, 5, 5, 5, … up until 101.
>
> For the sequencer, it goes like this: 1, 1, 3697, 3697, 3698, 3698,
> 3699, 3699, … up until 3796.
>
> The size of the cache is set in prepare_to_write_split_index().  It
> grows if a cache entry has no index (most of them should have one by
> now), or if the split index has no base index (with `--am', the split
> index always has a base.)  This comes from unpack_trees_start() -- it
> creates a new index, and unpack_trees() does not carry the base index,
> hence the size of the cache.
>
> The second attached patch (which is broken for the non-interactive
> rebase case) demonstrates what we could expect for the split-index case
> if we fix this:
>
> > 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.03)
> > 3400.4: rebase a lot of unrelated changes without split-index   5.81(5.62+0.17)
> > 3400.6: rebase a lot of unrelated changes with split-index      4.76(4.54+0.20)
> So, for everything related to the index, I think that’s it.
>
> [1] Numbers may vary, but they should remain in the same order of magnitude.

Unfortunately, this patch as-is breaks some important things even if
it only shows up in a few testcases.  merge-recursive needs to know
both what the index looked like before the merge started, as well as
what it looks like after unpack-trees runs; see commits 1de70dbd1a
(merge-recursive: fix check for skipability of working tree updates,
2018-04-19) and a35edc84bd (merge-recursive: fix was_tracked() to quit
lying with some renamed paths, 2018-04-19), and maybe a few others
from that series.

But, noting that it comes from the differences in the index as
unpack_trees runs is useful info.  I might be restructuring this code
somewhat significantly but it helps to have this in mind; I may spot
opportunities to do something with it while I'm digging in...

Elijah