Re: [PATCH v2 2/3] KVM: selftests: Randomize which pages are written vs read.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Matlack <dmatlack@xxxxxxxxxx> writes:

On Tue, Aug 30, 2022 at 07:02:10PM +0000, Colton Lewis wrote:
David Matlack <dmatlack@xxxxxxxxxx> writes:

> On Wed, Aug 17, 2022 at 09:41:45PM +0000, Colton Lewis wrote:
> > Randomize which tables are written vs read using the random number
> > arrays. Change the variable wr_fract and associated function calls to
> > write_percent that now operates as a percentage from 0 to 100 where X
> > means each page has an X% chance of being written. Change the -f
> > argument to -w to reflect the new variable semantics. Keep the same
> > default of 100 percent writes.

> Doesn't the new option cause like a 1000x slowdown in "Dirty memory
> time"?  I don't think we should merge this until that is understood and
> addressed (and it should be at least called out here so that reviewers
> can be made aware).


I'm guessing you got that from my internally posted tests. This option
itself does not cause the slowdown. If this option is set to 0% or 100%
(the default), there is no slowdown at all. The slowdown I measured was
at 50%, probably because that makes branch prediction impossible because
it has an equal chance of doing a read or a write each time. This is a
good thing. It's much more realistic than predictably alternating read
and write.

I found it hard to believe that branch prediction could affect
performance by 1000x (and why wouldn't random access order show the same
effect?) so I looked into it further.

The cause of the slowdown is actually MMU lock contention:

-   82.62%  [k] queued_spin_lock_slowpath
    - 82.09% queued_spin_lock_slowpath
       - 48.36% queued_write_lock_slowpath
          - _raw_write_lock
             - 22.18% kvm_mmu_notifier_invalidate_range_start
                  __mmu_notifier_invalidate_range_start
                  wp_page_copy
                  do_wp_page
                  __handle_mm_fault
                  handle_mm_fault
                  __get_user_pages
                  get_user_pages_unlocked
                  hva_to_pfn
                  __gfn_to_pfn_memslot
                  kvm_faultin_pfn
                  direct_page_fault
                  kvm_tdp_page_fault
                  kvm_mmu_page_fault
                  handle_ept_violation

I think the bug is due to the following:

  1. Randomized reads/writes were being applied to the Populate phase,
     which (when using anonymous memory) results in the guest memory being
     mapped to the Zero Page.
  2. The random access order changed across each iteration (Population
phase included) which means that some pages were written to during each
     iteration for the first time. Those pages resulted in a copy-on-write
     in the host MM fault handler, which invokes the invalidate range
     notifier and acquires the MMU lock in write-mode.
  3. All vCPUs are doing this in parallel which results in a ton of lock
     contention.

Your internal test results also showed that performance got better
during each iteration. That's because more and more of the guest memory
has been faulted in during each iteration (less and less copy-on-write
faults that need to acquire the MMU lock in write-mode).


Thanks for the analysis David. I had wondered about the effects of
randomized reads/writes during the populate phase.

The proper fix for v4 would be to set write-percent to 100 during the
populate phase so all memory actually gets populated. Then only use the
provided write-percent for testing dirty logging.


Will do that.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux