Re: Comparing rebase --am with --interactive via p3400

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alban & Elijah,

On Sun, 29 Dec 2019, Alban Gruin wrote:

> Hi Elijah,
>
> Le 27/12/2019 à 23:45, Elijah Newren a écrit :
> > Hi Alban,
> >
> > On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@xxxxxxxxx> wrote:
> >>
> >> Hi Johannes & Elijah,
> >>
> >> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit :
> >>> Hi Elijah,
> >>>
> >>> as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the
> >>> --am backend) and then with --keep-empty to force the interactive backend
> >>> to be used. Here are the best of 10, on my relatively powerful Windows 10
> >>> laptop, with current `master`.
> >>>
> >>> With regular rebase --am:
> >>>
> >>> 3400.2: rebase on top of a lot of unrelated changes             5.32(0.06+0.15)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   33.08(0.04+0.18)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      30.29(0.03+0.18)
> >>>
> >>> with --keep-empty to force the interactive backend:
> >>>
> >>> 3400.2: rebase on top of a lot of unrelated changes             3.92(0.03+0.18)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   33.92(0.03+0.22)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      38.82(0.03+0.16)
> >>>
> >>> I then changed it to -m to test the current scripted version, trying to
> >>> let it run overnight, but my laptop eventually went to sleep and the tests
> >>> were not even done. I'll let them continue and report back.
> >>>
> >>> My conclusion after seeing these numbers is: the interactive rebase is
> >>> really close to the performance of the --am backend. So to me, it makes a
> >>> total lot of sense to switch --merge over to it, and to make --merge the
> >>> default. We still should investigate why the split-index performance is so
> >>> significantly worse, though.
> >>>
> >>> Ciao,
> >>> Dscho
> >>>
> >>
> >> I investigated a bit on this.  From a quick glance at a callgrind trace,
> >> I can see that ce_write_entry() is called 20 601[1] times with `git am',
> >> but 739 802 times with the sequencer when the split-index is enabled.
> >
> > Sweet, thanks for digging in and analyzing this.
> >
> >> For reference, here are the timings, measured on my Linux machine, on a
> >> tmpfs, with git.git as the repo:
> >>
> >> `rebase --am':
> >>> 3400.2: rebase on top of a lot of unrelated changes             0.29(0.24+0.03)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   6.77(6.51+0.22)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      4.43(4.29+0.13)
> >> `rebase --quiet':
> >
> > --quiet?  Isn't that flag supposed to work with both backends and not
> > imply either one?  We previously used --keep-empty, though there's a
> > chance that flag means we're not doing a fair comparison (since 'am'
> > will drop empty commits and thus have less work to do).  Is there any
> > chance you actually ran a different command, but when you went to
> > summarize just typed the wrong flag name?  Anyway, the best would
> > probably be to use --merge here (at the time Johannes and I were
> > testing, that wouldn't have triggered the sequencer, but it does now),
> > after first applying the en/rebase-backend series just to make sure
> > we're doing an apples to apples comparison.  However, I suspect that
> > empty commits probably weren't much of a factor and you did find some
> > interesting things...
> >
>
> Yes, I did use `--keep-empty' but misremembered it when writing this email…
>
> >>> 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.02)
> >>> 3400.4: rebase a lot of unrelated changes without split-index   5.60(5.32+0.27)
> >>> 3400.6: rebase a lot of unrelated changes with split-index      5.67(5.40+0.26)
> >>
> >> This comes from two things:
> >>
> >> 1. There is not enough shared entries in the index with the sequencer.
> >>
> >> do_write_index() is called only by do_write_locked_index() with `--am',
> >> but is also called by write_shared_index() with the sequencer once for
> >> every other commit.  As the latter is only called by
> >> write_locked_index(), which means that too_many_not_shared_entries()
> >> returns true for the sequencer, but never for `--am'.
> >>
> >> Removing the call to discard_index() in do_pick_commit() (as in the
> >> first attached patch) solve this particular issue, but this would
> >> require a more thorough analysis to see if it is actually safe to do.
> >
> > I'm actually surprised the sequencer would call discard_index(); I
> > would have thought it would have relied on merge_recursive() to do the
> > necessary index changes and updates other than writing the new index
> > out.  But I'm not quite as familar with the sequencer so perhaps
> > there's some reason I'm unaware of.  (Any chance this is a left-over
> > from when sequencer invoked external scripts to do the work, and thus
> > the index was updated in another processes' memory and on disk, and it
> > had to discard and re-read to get its own process updated?)
> >
>
> The sequencer re-reads the index after invoking an external command
> (either `git checkout', `git merge' or an `exec' command from the todo
> list), which makes sense.  But this one seems to come from 6eb1b437933
> ("cherry-pick/revert: make direct internal call to merge_tree()",
> 2008-09-02).  So, yes, quite old, and perhaps no longer justified.

Right. This commit also moved the `discard_cache()` call outside from the
`else` clause of the `if (no_commit)`.

That `else` clause goes all the way back to 9509af686bf (Make git-revert &
git-cherry-pick a builtin, 2007-03-01), and I admit freely that my memory
is no longer fresh on the specifics of this patch.

Looking at that patch, I think I simply discarded the index because a
subsequent code path would spawn the `git merge-recursive` process, which
would have changed the index externally.

> I know I had to add another discard_cache() in rebase--interactive.c
> because it broke something with the submodules, but it does not seems
> all that useful now that rebase.c no longer has to fork to use the
> sequencer.

FWIW I agree. The code is still quite complex at this point, but
infinitely more readable (thank you Elijah for taking point on simplifying
merge-recursive.c!). So I think that it might be the right point in time
to make sure that the index is not re-read and re-discarded over and over
again.

Thanks,
Dscho

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux