Re: [PATCH v2 0/9] KVM: x86/MMU: Optimize disabling dirty logging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 21, 2022 at 03:43:49PM -0700, Ben Gardon wrote:
> Currently disabling dirty logging with the TDP MMU is extremely slow.
> On a 96 vCPU / 96G VM it takes ~256 seconds to disable dirty logging
> with the TDP MMU, as opposed to ~4 seconds with the legacy MMU. This
> series optimizes TLB flushes and introduces in-place large page
> promotion, to bring the disable dirty log time down to ~3 seconds.
> 
> Testing:
> Ran KVM selftests and kvm-unit-tests on an Intel Haswell. This
> series introduced no new failures.
> 
> Performance:
> 
> Without this series, TDP MMU:
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 4.972184425s
> Enabling dirty logging time: 0.001943807s
> 
> Iteration 1 dirty memory time: 0.061862112s
> Iteration 1 get dirty log time: 0.001416413s
> Iteration 1 clear dirty log time: 1.417428057s
> Iteration 2 dirty memory time: 0.664103656s
> Iteration 2 get dirty log time: 0.000676724s
> Iteration 2 clear dirty log time: 1.149387201s
> Disabling dirty logging time: 256.682188868s
> Get dirty log over 2 iterations took 0.002093137s. (Avg 0.001046568s/iteration)
> Clear dirty log over 2 iterations took 2.566815258s. (Avg 1.283407629s/iteration)
> 
> Without this series, Legacy MMU:
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 4.892940915s
> Enabling dirty logging time: 0.001864603s
> 
> Iteration 1 dirty memory time: 0.060490391s
> Iteration 1 get dirty log time: 0.001416277s
> Iteration 1 clear dirty log time: 0.323548614s
> Iteration 2 dirty memory time: 29.217064826s
> Iteration 2 get dirty log time: 0.000696202s
> Iteration 2 clear dirty log time: 0.907089084s
> Disabling dirty logging time: 4.246216551s
> Get dirty log over 2 iterations took 0.002112479s. (Avg 0.001056239s/iteration)
> Clear dirty log over 2 iterations took 1.230637698s. (Avg 0.615318849s/iteration)
> 
> With this series, TDP MMU:
> (Updated since RFC. Pulling out patches 1-4 could have a performance impact.)
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 4.878083336s
> Enabling dirty logging time: 0.001874340s
> 
> Iteration 1 dirty memory time: 0.054867383s
> Iteration 1 get dirty log time: 0.001368377s
> Iteration 1 clear dirty log time: 1.406960856s
> Iteration 2 dirty memory time: 0.679301083s
> Iteration 2 get dirty log time: 0.000662905s
> Iteration 2 clear dirty log time: 1.138263359s
> Disabling dirty logging time: 3.169381810s
> Get dirty log over 2 iterations took 0.002031282s. (Avg 0.001015641s/iteration)
> Clear dirty log over 2 iterations took 2.545224215s. (Avg 1.272612107s/iteration)
> 
> Patch breakdown:
> Patches 1-4 remove the need for a vCPU pointer to make_spte
> Patches 5-8 are small refactors in preparation for in-place lpage promotion
> Patch 9 implements in-place largepage promotion when disabling dirty logging
> 
> Changelog:
> RFC -> v1:
> 	Dropped the first 4 patches from the series. Patch 1 was sent
> 	separately, patches 2-4 will be taken over by Sean Christopherson.
> 	Incorporated David Matlack's Reviewed-by.
> v1 -> v2:
> 	Several patches were queued and dropped from this revision.
> 	Incorporated feedback from Peter Xu on the last patch in the series.
> 	Refreshed performance data
> 		Between versions 1 and 2 of this series, disable time without
> 		the TDP MMU went from 45s to 256, a major regression. I was
> 		testing on a skylake before and haswell this time, but that
> 		does not explain the huge performance loss.
> 
> Ben Gardon (9):
>   KVM: x86/mmu: Move implementation of make_spte to a helper
>   KVM: x86/mmu: Factor mt_mask out of __make_spte
>   KVM: x86/mmu: Factor shadow_zero_check out of __make_spte
>   KVM: x86/mmu: Replace vcpu argument with kvm pointer in make_spte
>   KVM: x86/mmu: Factor out the meat of reset_tdp_shadow_zero_bits_mask
>   KVM: x86/mmu: Factor out part of vmx_get_mt_mask which does not depend
>     on vcpu
>   KVM: x86/mmu: Add try_get_mt_mask to x86_ops
>   KVM: x86/mmu: Make kvm_is_mmio_pfn usable outside of spte.c
>   KVM: x86/mmu: Promote pages in-place when disabling dirty logging

Use () after function names to make it clear you are referring to a
function and not something else. e.g.

  KVM: x86/mmu: Move implementation of make_spte to a helper

becomes

  KVM: x86/mmu: Move implementation of make_spte() to a helper

This applies throughout the series, in commit messages and comments.

> 
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  2 +
>  arch/x86/kvm/mmu/mmu.c             | 21 +++++----
>  arch/x86/kvm/mmu/mmu_internal.h    |  6 +++
>  arch/x86/kvm/mmu/spte.c            | 39 +++++++++++-----
>  arch/x86/kvm/mmu/spte.h            |  6 +++
>  arch/x86/kvm/mmu/tdp_mmu.c         | 73 +++++++++++++++++++++++++++++-
>  arch/x86/kvm/svm/svm.c             |  9 ++++
>  arch/x86/kvm/vmx/vmx.c             | 25 ++++++++--
>  9 files changed, 155 insertions(+), 27 deletions(-)
> 
> -- 
> 2.35.1.894.gb6a874cedc-goog
> 



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux