Re: [PATCH v2 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information

Oliver Upton <oliver.upton@xxxxxxxxx> · Mon, 3 Jun 2024 18:36:57 +0000

On Wed, May 29, 2024 at 03:56:25PM +0100, Marc Zyngier wrote:
> In order to be able to make S2 TLB invalidations more performant on NV,
> let's use a scheme derived from the FEAT_TTL extension.
> 
> If bits [56:55] in the leaf descriptor translating the address in the
> corresponding shadow S2 are non-zero, they indicate a level which can
> be used as an invalidation range. This allows further reduction of the
> systematic over-invalidation that takes place otherwise.
> 
> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> ---
>  arch/arm64/kvm/nested.c | 83 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 8570b5bd0289..5ab5c43c571b 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -4,6 +4,7 @@
>   * Author: Jintack Lim <jintack.lim@xxxxxxxxxx>
>   */
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> @@ -421,12 +422,92 @@ static unsigned int ttl_to_size(u8 ttl)
>  	return max_size;
>  }
>  
> +/*
> + * Compute the equivalent of the TTL field by parsing the shadow PT.  The
> + * granule size is extracted from the cached VTCR_EL2.TG0 while the level is
> + * retrieved from first entry carrying the level as a tag.
> + */
> +static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr)
> +{

Can you add a lockdep assertion that the MMU lock is held for write
here? At least for me this is far enough away from the 'real' page table
walk that it wasn't clear what locks were held at this point.

> +	u64 tmp, sz = 0, vtcr = mmu->tlb_vtcr;
> +	kvm_pte_t pte;
> +	u8 ttl, level;
> +
> +	switch (vtcr & VTCR_EL2_TG0_MASK) {
> +	case VTCR_EL2_TG0_4K:
> +		ttl = (TLBI_TTL_TG_4K << 2);
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		ttl = (TLBI_TTL_TG_16K << 2);
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +	default:	    /* IMPDEF: treat any other value as 64k */
> +		ttl = (TLBI_TTL_TG_64K << 2);
> +		break;
> +	}
> +
> +	tmp = addr;
> +
> +again:
> +	/* Iteratively compute the block sizes for a particular granule size */
> +	switch (vtcr & VTCR_EL2_TG0_MASK) {
> +	case VTCR_EL2_TG0_4K:
> +		if	(sz < SZ_4K)	sz = SZ_4K;
> +		else if (sz < SZ_2M)	sz = SZ_2M;
> +		else if (sz < SZ_1G)	sz = SZ_1G;
> +		else			sz = 0;
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		if	(sz < SZ_16K)	sz = SZ_16K;
> +		else if (sz < SZ_32M)	sz = SZ_32M;
> +		else			sz = 0;
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +	default:	    /* IMPDEF: treat any other value as 64k */
> +		if	(sz < SZ_64K)	sz = SZ_64K;
> +		else if (sz < SZ_512M)	sz = SZ_512M;
> +		else			sz = 0;
> +		break;
> +	}
> +
> +	if (sz == 0)
> +		return 0;
> +
> +	tmp &= ~(sz - 1);
> +	if (kvm_pgtable_get_leaf(mmu->pgt, tmp, &pte, NULL))
> +		goto again;

Assuming we're virtualizing a larger TG than what's in use at L0 this
may not actually find a valid leaf that exists within the span of a
single virtual TLB entry.

For example, if we're using 4K at L0 and 16K at L1, we could have:

	[ ----- valid 16K entry ------- ]

mapped as:

	[ ----- | ----- | valid | ----- ]

in the shadow S2. kvm_pgtable_get_leaf() will always return the first
splintered page, which could be invalid.

What I'm getting at is: should this use a bespoke table walker that
scans for a valid TTL in the range of [addr, addr + sz)? It may make
sense to back off a bit more aggressively and switch to a conservative,
unscoped TLBI to avoid visiting too many PTEs.

-- 
Thanks,
Oliver