Re: Comparing rebase --am with --interactive via p3400

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah,

Le 27/12/2019 à 23:45, Elijah Newren a écrit :
> Hi Alban,
> 
> On Fri, Dec 27, 2019 at 1:11 PM Alban Gruin <alban.gruin@xxxxxxxxx> wrote:
>>
>> Hi Johannes & Elijah,
>>
>> Le 01/02/2019 à 07:04, Johannes Schindelin a écrit :
>>> Hi Elijah,
>>>
>>> as discussed at the Contributors' Summit, I ran p3400 as-is (i.e. with the
>>> --am backend) and then with --keep-empty to force the interactive backend
>>> to be used. Here are the best of 10, on my relatively powerful Windows 10
>>> laptop, with current `master`.
>>>
>>> With regular rebase --am:
>>>
>>> 3400.2: rebase on top of a lot of unrelated changes             5.32(0.06+0.15)
>>> 3400.4: rebase a lot of unrelated changes without split-index   33.08(0.04+0.18)
>>> 3400.6: rebase a lot of unrelated changes with split-index      30.29(0.03+0.18)
>>>
>>> with --keep-empty to force the interactive backend:
>>>
>>> 3400.2: rebase on top of a lot of unrelated changes             3.92(0.03+0.18)
>>> 3400.4: rebase a lot of unrelated changes without split-index   33.92(0.03+0.22)
>>> 3400.6: rebase a lot of unrelated changes with split-index      38.82(0.03+0.16)
>>>
>>> I then changed it to -m to test the current scripted version, trying to
>>> let it run overnight, but my laptop eventually went to sleep and the tests
>>> were not even done. I'll let them continue and report back.
>>>
>>> My conclusion after seeing these numbers is: the interactive rebase is
>>> really close to the performance of the --am backend. So to me, it makes a
>>> total lot of sense to switch --merge over to it, and to make --merge the
>>> default. We still should investigate why the split-index performance is so
>>> significantly worse, though.
>>>
>>> Ciao,
>>> Dscho
>>>
>>
>> I investigated a bit on this.  From a quick glance at a callgrind trace,
>> I can see that ce_write_entry() is called 20 601[1] times with `git am',
>> but 739 802 times with the sequencer when the split-index is enabled.
> 
> Sweet, thanks for digging in and analyzing this.
> 
>> For reference, here are the timings, measured on my Linux machine, on a
>> tmpfs, with git.git as the repo:
>>
>> `rebase --am':
>>> 3400.2: rebase on top of a lot of unrelated changes             0.29(0.24+0.03)
>>> 3400.4: rebase a lot of unrelated changes without split-index   6.77(6.51+0.22)
>>> 3400.6: rebase a lot of unrelated changes with split-index      4.43(4.29+0.13)
>> `rebase --quiet':
> 
> --quiet?  Isn't that flag supposed to work with both backends and not
> imply either one?  We previously used --keep-empty, though there's a
> chance that flag means we're not doing a fair comparison (since 'am'
> will drop empty commits and thus have less work to do).  Is there any
> chance you actually ran a different command, but when you went to
> summarize just typed the wrong flag name?  Anyway, the best would
> probably be to use --merge here (at the time Johannes and I were
> testing, that wouldn't have triggered the sequencer, but it does now),
> after first applying the en/rebase-backend series just to make sure
> we're doing an apples to apples comparison.  However, I suspect that
> empty commits probably weren't much of a factor and you did find some
> interesting things...
> 

Yes, I did use `--keep-empty' but misremembered it when writing this email…

>>> 3400.2: rebase on top of a lot of unrelated changes             0.24(0.21+0.02)
>>> 3400.4: rebase a lot of unrelated changes without split-index   5.60(5.32+0.27)
>>> 3400.6: rebase a lot of unrelated changes with split-index      5.67(5.40+0.26)
>>
>> This comes from two things:
>>
>> 1. There is not enough shared entries in the index with the sequencer.
>>
>> do_write_index() is called only by do_write_locked_index() with `--am',
>> but is also called by write_shared_index() with the sequencer once for
>> every other commit.  As the latter is only called by
>> write_locked_index(), which means that too_many_not_shared_entries()
>> returns true for the sequencer, but never for `--am'.
>>
>> Removing the call to discard_index() in do_pick_commit() (as in the
>> first attached patch) solve this particular issue, but this would
>> require a more thorough analysis to see if it is actually safe to do.
> 
> I'm actually surprised the sequencer would call discard_index(); I
> would have thought it would have relied on merge_recursive() to do the
> necessary index changes and updates other than writing the new index
> out.  But I'm not quite as familar with the sequencer so perhaps
> there's some reason I'm unaware of.  (Any chance this is a left-over
> from when sequencer invoked external scripts to do the work, and thus
> the index was updated in another processes' memory and on disk, and it
> had to discard and re-read to get its own process updated?)
> 

The sequencer re-reads the index after invoking an external command
(either `git checkout', `git merge' or an `exec' command from the todo
list), which makes sense.  But this one seems to come from 6eb1b437933
("cherry-pick/revert: make direct internal call to merge_tree()",
2008-09-02).  So, yes, quite old, and perhaps no longer justified.

I know I had to add another discard_cache() in rebase--interactive.c
because it broke something with the submodules, but it does not seems
all that useful now that rebase.c no longer has to fork to use the
sequencer.

Cheers,
Alban




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux