Re: Comparing rebase --am with --interactive via p3400

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alban,

On Fri, 27 Dec 2019, Alban Gruin wrote:

> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit :
>
> > as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the
> > --am backend) and then with --keep-empty to force the interactive backend
> > to be used. Here are the best of 10, on my relatively powerful Windows 10
> > laptop, with current `master`.
> >
> > With regular rebase --am:
> >
> > 3400.2: rebase on top of a lot of unrelated changes             5.32(0.06+0.15)
> > 3400.4: rebase a lot of unrelated changes without split-index   33.08(0.04+0.18)
> > 3400.6: rebase a lot of unrelated changes with split-index      30.29(0.03+0.18)
> >
> > with --keep-empty to force the interactive backend:
> >
> > 3400.2: rebase on top of a lot of unrelated changes             3.92(0.03+0.18)
> > 3400.4: rebase a lot of unrelated changes without split-index   33.92(0.03+0.22)
> > 3400.6: rebase a lot of unrelated changes with split-index      38.82(0.03+0.16)
> >
> > I then changed it to -m to test the current scripted version, trying to
> > let it run overnight, but my laptop eventually went to sleep and the tests
> > were not even done. I'll let them continue and report back.
> >
> > My conclusion after seeing these numbers is: the interactive rebase is
> > really close to the performance of the --am backend. So to me, it makes a
> > total lot of sense to switch --merge over to it, and to make --merge the
> > default. We still should investigate why the split-index performance is so
> > significantly worse, though.
> >
> > Ciao,
> > Dscho
> >
>
> I investigated a bit on this.  From a quick glance at a callgrind trace,
> I can see that ce_write_entry() is called 20 601[1] times with `git am',
> but 739 802 times with the sequencer when the split-index is enabled.
>
> For reference, here are the timings, measured on my Linux machine, on a
> tmpfs, with git.git as the repo:
>
> `rebase --am':
> > 3400.2: rebase on top of a lot of unrelated changes             0.29(0.24+0.03)
> > 3400.4: rebase a lot of unrelated changes without split-index   6.77(6.51+0.22)
> > 3400.6: rebase a lot of unrelated changes with split-index      4.43(4.29+0.13)
> `rebase --quiet':
> > 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.02)
> > 3400.4: rebase a lot of unrelated changes without split-index   5.60(5.32+0.27)
> > 3400.6: rebase a lot of unrelated changes with split-index      5.67(5.40+0.26)
>
> This comes from two things:
>
> 1. There is not enough shared entries in the index with the sequencer.
>
> do_write_index() is called only by do_write_locked_index() with `--am',
> but is also called by write_shared_index() with the sequencer once for
> every other commit.  As the latter is only called by
> write_locked_index(), which means that too_many_not_shared_entries()
> returns true for the sequencer, but never for `--am'.
>
> Removing the call to discard_index() in do_pick_commit() (as in the
> first attached patch) solve this particular issue, but this would
> require a more thorough analysis to see if it is actually safe to do.

Indeed. I offered these insights in #git-devel (slightly edited):

This `discard_index()` is in an awfully central location. I am rather
certain that it would cause problems to just remove it.

Looking at `do_merge()`: it explicitly discards and re-reads the index if
we had to spawn a `git merge` process (which we do if a strategy other
than `recursive` was specified, or if it is an octopus merge). But I am
wary of other code paths that might not be as careful.

I see that `do_exec()` is similarly careful.

One thing I cannot fail to notice: we do not re-read a changed index
after running the `prepare-commit-msg` hook, or for that matter, any other
hook. That could even be an old regression from the conversion of the
interactive rebase to using the sequencer rather than a shell script.

Further, `reset_merge()` seems to spawn `git reset --merge` without
bothering to re-read the possibly modified index. Its callees are
`rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`,
none of which seem to be careful, either, about checking whether the index
was modified in the meantime.

Technically, the in-memory index should also be discarded
in `apply_autostash()`, but so far, we do not have any callers of that
function, I don't think, that wants to do anything but release resources
and exit.

The `run_git_checkout()` function discards, as intended. I
am not quite sure whether it needs to, though, unless the `.git/index`
file _was_ modified (it _is_ possible, after all, to run `git rebase -i
HEAD`, and I do have a use case for that where one of my scripts generates
a todo script, sort of a `git cherry-pick --rebase-merges`, because
`cherry-pick` does not support that mode).

The `continue_single_pick()` function spawns a `git
commit` which could potentially modify the index through a hook, but the
first call site does not care and the second one guards against that
(erroring out...).

My biggest concern is with the `run_git_commit()` function: it does not
re-read a potentially-modified index (think of hooks).

We will need to be very careful with this `discard_index()`, I think, and
in my opinion there is a great opportunity here for cleaning up a little:
rather than discarding and re-reading the in-memory index without seeing
whether the on-disk index has changed at all appears a bit wasteful to me.

This could be refactored into a function that only discards and re-reads
the index if the mtime of `.git/index` changed. That function could then
also be taught to detect when the in-memory index has unwritten changes:
that would constitute a bug.

Ciao,
Dscho

>
> After this, ce_write() is still called much more by the sequencer.
>
> Here are the results of `rebase --quiet' without discarding the index:
>
> > 3400.2: rebase on top of a lot of unrelated changes             0.23(0.19+0.04)
> > 3400.4: rebase a lot of unrelated changes without split-index   5.14(4.95+0.18)
> > 3400.6: rebase a lot of unrelated changes with split-index      5.02(4.87+0.15)
> The performance of the rebase is better in the two cases.
>
>
> 2. The base index is dropped by unpack_trees_start() and unpack_trees().
>
> Now, write_shared_index() is no longer called and write_locked_index()
> is less expensive than before according to callgrind.  But
> ce_write_entry() is still called 749 302 times (which is even more than
> before.)
>
> The only place where ce_write_entry() is called is in a loop in
> do_write_index().  The number of iterations is dictated by the size of
> the cache, and there is a trace2 probe dumping this value.
>
> For `--am', the value goes like this: 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4,
> 4, 4, 5, 5, 5, 5, … up until 101.
>
> For the sequencer, it goes like this: 1, 1, 3697, 3697, 3698, 3698,
> 3699, 3699, … up until 3796.
>
> The size of the cache is set in prepare_to_write_split_index().  It
> grows if a cache entry has no index (most of them should have one by
> now), or if the split index has no base index (with `--am', the split
> index always has a base.)  This comes from unpack_trees_start() -- it
> creates a new index, and unpack_trees() does not carry the base index,
> hence the size of the cache.
>
> The second attached patch (which is broken for the non-interactive
> rebase case) demonstrates what we could expect for the split-index case
> if we fix this:
>
> > 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.03)
> > 3400.4: rebase a lot of unrelated changes without split-index   5.81(5.62+0.17)
> > 3400.6: rebase a lot of unrelated changes with split-index      4.76(4.54+0.20)
> So, for everything related to the index, I think that’s it.
>
> [1] Numbers may vary, but they should remain in the same order of magnitude.
>
> Cheers,
> Alban
>
>

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux