Re: [PATCH 4/4] KVM: PPC: Book3S HV: Flush guest mappings when turning dirty tracking on/off

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2018-12-12 at 15:17 +1100, Paul Mackerras wrote:
> This adds code to flush the partition-scoped page tables for a radix
> guest when dirty tracking is turned on or off for a memslot.  Only
> the
> guest real addresses covered by the memslot are flushed.  The reason
> for this is to get rid of any 2M PTEs in the partition-scoped page
> tables that correspond to host transparent huge pages, so that page
> dirtiness is tracked at a system page (4k or 64k) granularity rather
> than a 2M granularity.  The page tables are also flushed when turning
> dirty tracking off so that the memslot's address space can be
> repopulated with THPs if possible.
> 
> To do this, we add a new function
> kvmppc_radix_flush_memslot().  Since
> this does what's needed for kvmppc_core_flush_memslot_hv() on a radix
> guest, we now make kvmppc_core_flush_memslot_hv() call the new
> kvmppc_radix_flush_memslot() rather than calling kvm_unmap_radix()
> for each page in the memslot.  This has the effect of fixing a bug in
> that kvmppc_core_flush_memslot_hv() was previously calling
> kvm_unmap_radix() without holding the kvm->mmu_lock spinlock, which
> is required to be held.
> 

One comment below.
Either way:

Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx>

> Signed-off-by: Paul Mackerras <paulus@xxxxxxxxxx>
> ---
>  arch/powerpc/include/asm/kvm_book3s.h  |  2 ++
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |  9 +++++----
>  arch/powerpc/kvm/book3s_64_mmu_radix.c | 20 ++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv.c           | 17 +++++++++++++++++
>  4 files changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 728d2b7..f8a5ac8 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -222,6 +222,8 @@ extern int kvm_test_age_radix(struct kvm *kvm,
> struct kvm_memory_slot *memslot,
>  			unsigned long gfn);
>  extern long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm,
>  			struct kvm_memory_slot *memslot, unsigned
> long *map);
> +extern void kvmppc_radix_flush_memslot(struct kvm *kvm,
> +			const struct kvm_memory_slot *memslot);
>  extern int kvmhv_get_rmmu_info(struct kvm *kvm, struct
> kvm_ppc_rmmu_info *info);
>  
>  /* XXX remove this export when load_last_inst() is generic */
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index a18afda..6f2d2fb 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -899,11 +899,12 @@ void kvmppc_core_flush_memslot_hv(struct kvm
> *kvm,
>  
>  	gfn = memslot->base_gfn;
>  	rmapp = memslot->arch.rmap;
> +	if (kvm_is_radix(kvm)) {
> +		kvmppc_radix_flush_memslot(kvm, memslot);
> +		return;
> +	}
> +
>  	for (n = memslot->npages; n; --n, ++gfn) {
> -		if (kvm_is_radix(kvm)) {
> -			kvm_unmap_radix(kvm, memslot, gfn);
> -			continue;
> -		}
>  		/*
>  		 * Testing the present bit without locking is OK
> because
>  		 * the memslot has been marked invalid already, and
> hence
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 52711eb..d675ad9 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -958,6 +958,26 @@ long kvmppc_hv_get_dirty_log_radix(struct kvm
> *kvm,
>  	return 0;
>  }
>  
> +void kvmppc_radix_flush_memslot(struct kvm *kvm,
> +				const struct kvm_memory_slot
> *memslot)
> +{
> +	unsigned long n;
> +	pte_t *ptep;
> +	unsigned long gpa;
> +	unsigned int shift;
> +
> +	gpa = memslot->base_gfn << PAGE_SHIFT;
> +	spin_lock(&kvm->mmu_lock);
> +	for (n = memslot->npages; n; --n) {
> +		ptep = __find_linux_pte(kvm->arch.pgtable, gpa,
> NULL, &shift);
> +		if (ptep && pte_present(*ptep))
> +			kvmppc_unmap_pte(kvm, ptep, gpa, shift,
> memslot,
> +					 kvm->arch.lpid);
> +		gpa += PAGE_SIZE;
> +	}
> +	spin_unlock(&kvm->mmu_lock);

I don't know how expensive it is calling __find_linux_pte() may times
for a 2M or 1G page when we know we've already invalidated the mapping?
Could be improved with:

end_gpa = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
for (; gpa < end_gpa; gpa += (1UL << shift)) {
	...
	if (!shift)
		shift = PAGE_SHIFT;
}

> +}
> +
>  static void add_rmmu_ap_encoding(struct kvm_ppc_rmmu_info *info,
>  				 int psize, int *indexp)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv.c
> b/arch/powerpc/kvm/book3s_hv.c
> index f4fbb7b5..074ff5b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4384,6 +4384,23 @@ static void
> kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>  	 */
>  	if (npages)
>  		atomic64_inc(&kvm->arch.mmio_update);
> +
> +	/*
> +	 * For change == KVM_MR_MOVE or KVM_MR_DELETE, higher levels
> +	 * have already called kvm_arch_flush_shadow_memslot() to
> +	 * flush shadow mappings.  For KVM_MR_CREATE we have no
> +	 * previous mappings.  So the only case to handle is
> +	 * KVM_MR_FLAGS_ONLY when the KVM_MEM_LOG_DIRTY_PAGES bit
> +	 * has been changed.
> +	 * For radix guests, we flush on setting
> KVM_MEM_LOG_DIRTY_PAGES
> +	 * to get rid of any THP PTEs in the partition-scoped page
> tables
> +	 * so we can track dirtiness at the page level; we flush
> when
> +	 * clearing KVM_MEM_LOG_DIRTY_PAGES so that we can go back
> to
> +	 * using THP PTEs.
> +	 */
> +	if (change == KVM_MR_FLAGS_ONLY && kvm_is_radix(kvm) &&
> +	    ((new->flags ^ old->flags) & KVM_MEM_LOG_DIRTY_PAGES))
> +		kvmppc_radix_flush_memslot(kvm, old);
>  }
>  
>  /*



[Index of Archives]     [KVM Development]     [KVM ARM]     [KVM ia64]     [Linux Virtualization]     [Linux USB Devel]     [Linux Video]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux