Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API

Alexandru Elisei <alexandru.elisei@xxxxxxx> · Wed, 2 Sep 2020 17:23:08 +0100

Hello,

On 8/25/20 10:39 AM, Will Deacon wrote:
> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
> of walking the page-table directly.
>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Quentin Perret <qperret@xxxxxxxxxx>
> Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> ---
>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>  1 file changed, 32 insertions(+), 25 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 704b471a48ce..751ce2462765 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>  }
>  
> +/*
> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
> + * long will also starve other vCPUs. We have to also make sure that the page
> + * tables are not freed while we released the lock.
> + */
> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
> +({									\
> +	int ret;							\
> +	struct kvm *__kvm = (kvm);					\
> +	bool __resched = (resched);					\
> +	u64 next, __addr = (addr), __end = (end);			\
> +	do {								\
> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
> +		if (!pgt)						\
> +			break;						\

I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
to me like the compiler might decide to optimize by reading pgt once at the start
of the loop and stashing it in a register. Would you mind explaining what I am
missing?

> +		next = stage2_pgd_addr_end(__kvm, __addr, __end);	\
> +		ret = fn(pgt, __addr, next - __addr);			\
> +		if (ret)						\
> +			break;						\
> +		if (__resched && next != __end)				\
> +			cond_resched_lock(&__kvm->mmu_lock);		\
> +	} while (__addr = next, __addr != __end);			\
> +	ret;								\
> +})

This seems unusual to me. We have a non-trivial, multiline macro which calls
cond_resched(), has 6 local variables, and is called from exactly one place.I am
curious why we are not open coding the loop in __unmap_stage2_range() or using a
function.

> +
>  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>  {
>  	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> @@ -220,8 +247,8 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
>   * end up writing old data to disk.
>   *
>   * This is why right after unmapping a page/section and invalidating
> - * the corresponding TLBs, we call kvm_flush_dcache_p*() to make sure
> - * the IO subsystem will never hit in the cache.
> + * the corresponding TLBs, we flush to make sure the IO subsystem will
> + * never hit in the cache.
>   *
>   * This is all avoided on systems that have ARM64_HAS_STAGE2_FWB, as
>   * we then fully enforce cacheability of RAM, no matter what the guest
> @@ -344,32 +371,12 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				 bool may_block)
>  {
>  	struct kvm *kvm = mmu->kvm;
> -	pgd_t *pgd;
> -	phys_addr_t addr = start, end = start + size;
> -	phys_addr_t next;
> +	phys_addr_t end = start + size;
>  
>  	assert_spin_locked(&kvm->mmu_lock);
>  	WARN_ON(size & ~PAGE_MASK);
> -
> -	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
> -	do {
> -		/*
> -		 * Make sure the page table is still active, as another thread
> -		 * could have possibly freed the page table, while we released
> -		 * the lock.
> -		 */
> -		if (!READ_ONCE(mmu->pgd))
> -			break;
> -		next = stage2_pgd_addr_end(kvm, addr, end);
> -		if (!stage2_pgd_none(kvm, *pgd))
> -			unmap_stage2_p4ds(mmu, pgd, addr, next);
> -		/*
> -		 * If the range is too large, release the kvm->mmu_lock
> -		 * to prevent starvation and lockup detector warnings.
> -		 */
> -		if (may_block && next != end)
> -			cond_resched_lock(&kvm->mmu_lock);
> -	} while (pgd++, addr = next, addr != end);
> +	WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap,
> +				   may_block));
>  }
>  
>  static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm