Re: [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM

Marc Zyngier <maz@xxxxxxxxxx> · Wed, 11 Dec 2024 17:40:36 +0000

On Wed, 11 Dec 2024 16:01:37 +0000,
Mikołaj Lenczewski <miko.lenczewski@xxxxxxx> wrote:
> 
> Currently, KVM does not handle the case of a stage 2 TLB conflict abort
> exception. The Arm ARM specifies that the worst-case handling of such an
> exception requires a `tlbi vmalls12e1`.

Not quite. It says (I_JCCRT):

<quote>
* For the EL1&0 translation regime, when stage 2 translations are in
  use, either VMALLS12E1 or ALLE1.
</quote>

> Perform such an invalidation when this exception is encountered.

What you fail to describe is *why* this is needed. You know it, I know
it, but not everybody does. A reference to the ARM ARM would
definitely be helpful.

> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@xxxxxxx>
> ---
>  arch/arm64/include/asm/esr.h | 8 ++++++++
>  arch/arm64/kvm/mmu.c         | 6 ++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index d1b1a33f9a8b..8a66f81ca291 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -121,6 +121,7 @@
>  #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
>  #define ESR_ELx_FSC_SECC	(0x18)
>  #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
> +#define ESR_ELx_FSC_TLBABT	(0x30)
>  
>  /* Status codes for individual page table levels */
>  #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
> @@ -464,6 +465,13 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
>  	       (esr == ESR_ELx_FSC_ACCESS_L(0));
>  }
>  
> +static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
> +{
> +	esr = esr & ESR_ELx_FSC;
> +
> +	return esr == ESR_ELx_FSC_TLBABT;
> +}
> +
>  /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
>  static inline bool esr_iss_is_eretax(unsigned long esr)
>  {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9d46ad57e52..c8c6f5a97a1b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
> +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
> +		// does a `tlbi vmalls12e1is`

nit: this isn't a very useful comment.

> +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> +		return 1;
> +	}

That's not enough, unfortunately. A nested VM has *many* VMIDs (the
flattening of all translation contexts that the guest uses).

So you can either iterate over all the valid VMIDs owned by this
guest, or more simply issue a TLBI ALLE1, which will do the trick in a
much more efficient way.

The other thing is that you are using an IS invalidation, which is
farther reaching than necessary. Why would you invalidate the TLBs for
CPUs that are only innocent bystanders? A non-shareable invalidation
seems preferable to me.

> +
>  	if (esr_fsc_is_translation_fault(esr)) {
>  		/* Beyond sanitised PARange (which is the IPA limit) */
>  		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {

But it also begs the question: why only KVM, and not the host? This
handler will only take effect for a TLB Conflict abort delivered from
an EL1 guest to EL2.

However, it doesn't seem to me that the host is equipped to deal with
this sort of exception for itself. Shouldn't you start with that?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.