Re: [RFC PATCH 13/15] KVM: x86/mmu: Split large pages during CLEAR_DIRTY_LOG

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 1 Dec 2021 19:22:30 +0000

On Fri, Nov 19, 2021, David Matlack wrote:
> When using initially-all-set, large pages are not write-protected when
> dirty logging is enabled on the memslot. Instead they are
> write-protected once userspace invoked CLEAR_DIRTY_LOG for the first
> time, and only for the specific sub-region of the memslot that userspace
> whishes to clear.
> 
> Enhance CLEAR_DIRTY_LOG to also try to split large pages prior to
> write-protecting to avoid causing write-protection faults on vCPU
> threads. This also allows userspace to smear the cost of large page
> splitting across multiple ioctls rather than splitting the entire
> memslot when not using initially-all-set.
> 
> Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
> ---
>  arch/x86/include/asm/kvm_host.h |  4 ++++
>  arch/x86/kvm/mmu/mmu.c          | 30 ++++++++++++++++++++++--------
>  2 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 432a4df817ec..6b5bf99f57af 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1591,6 +1591,10 @@ void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
>  				      const struct kvm_memory_slot *memslot,
>  				      int start_level);
> +void kvm_mmu_try_split_large_pages(struct kvm *kvm,

I would prefer we use hugepage when possible, mostly because that's the terminology
used by the kernel.  KVM is comically inconsistent, but if we make an effort to use
hugepage when adding new code, hopefully someday we'll have enough inertia to commit
fully to hugepage.

> +				   const struct kvm_memory_slot *memslot,
> +				   u64 start, u64 end,
> +				   int target_level);
>  void kvm_mmu_slot_try_split_large_pages(struct kvm *kvm,
>  					const struct kvm_memory_slot *memslot,
>  					int target_level);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6768ef9c0891..4e78ef2dd352 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1448,6 +1448,12 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		gfn_t start = slot->base_gfn + gfn_offset + __ffs(mask);
>  		gfn_t end = slot->base_gfn + gfn_offset + __fls(mask);
>  
> +		/*
> +		 * Try to proactively split any large pages down to 4KB so that
> +		 * vCPUs don't have to take write-protection faults.
> +		 */
> +		kvm_mmu_try_split_large_pages(kvm, slot, start, end, PG_LEVEL_4K);

This should return a value.  If splitting succeeds, there should be no hugepages
and so walking the page tables to write-protect 2M is unnecessary.  Same for the
previous patch, although skipping the write-protect path is a little less
straightforward in that case.

> +
>  		kvm_mmu_slot_gfn_write_protect(kvm, slot, start, PG_LEVEL_2M);
>  
>  		/* Cross two large pages? */