Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

Jerome Glisse <jglisse@xxxxxxxxxx> · Mon, 18 Feb 2019 11:04:13 -0500

On Mon, Feb 11, 2019 at 03:02:00PM -0500, Andrea Arcangeli wrote:
> On Mon, Feb 11, 2019 at 02:09:31PM -0500, Jerome Glisse wrote:
> > Yeah, between do you have any good workload for me to test this ? I
> > was thinking of running few same VM and having KSM work on them. Is
> > there some way to trigger KVM to fork ? As the other case is breaking
> > COW after fork.
> 
> KVM can fork on guest pci-hotplug events or network init to run host
> scripts and re-init the signals before doing the exec, but it won't
> move the needle because all guest memory registered in the MMU
> notifier is set as MADV_DONTFORK... so fork() is a noop unless qemu is
> also modified not to call MADV_DONTFORK.
> 
> Calling if (!fork()) exit(0) from a timer at regular intervals during
> qemu runtime after turning off MADV_DONTFORK in qemu would allow to
> exercise fork against the KVM MMU Notifier methods.
> 
> The optimized change_pte code in copy-on-write code is the same
> post-fork or post-KSM merge and fork() itself doesn't use change_pte
> while KSM does, so with regard to change_pte it should already provide
> a good test coverage to test with only KSM without fork(). It'll cover
> the read-write -> readonly transition with same PFN
> (write_protect_page), the read-only to read-only changing PFN
> (replace_page) as well as the readonly -> read-write transition
> changing PFN (wp_page_copy) all three optimized with change_pte. Fork
> would not leverage change_pte for the first two cases.

So i run 2 exact same VMs side by side (copy of same COW image) and
built the same kernel tree inside each (that is the only important
workload that exist ;)) but the change_pte did not have any impact:

before  mean  {real: 1358.250977, user: 16650.880859, sys: 839.199524, npages: 76855.390625}
before  stdev {real:    6.744010, user:   108.863762, sys:   6.840437, npages:  1868.071899}
after   mean  {real: 1357.833740, user: 16685.849609, sys: 839.646973, npages: 76210.601562}
after   stdev {real:    5.124797, user:    78.469360, sys:   7.009164, npages:  2468.017578}
without mean  {real: 1358.501343, user: 16674.478516, sys: 837.791992, npages: 76225.203125}
without stdev {real:    5.541104, user:    97.998367, sys:   6.715869, npages:  1682.392578}

Above is time taken by make inside each VM for all yes config. npages
is the number of page shared reported on the host at the end of the
build.

There is no change before and after the patchset to restore change
pte. I tried removing the change_pte callback alltogether to see if
that did have any effect (without above) and it did not have any
effect either.

Should we still restore change_pte() ? It does not hurt, but it does
not seems to help in anyway. Maybe you have a better benchmark i could
run ?

Cheers,
Jérôme