Re: [PATCH v3 2/3] KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

On Thu, Mar 3, 2022 at 8:11 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:

Reiji,

Please add a cover letter to your patches. It actually is important to
track the changes as well as being an anchor in my email client.

Sure, I will add a cover letter for v4.


On Thu, 03 Mar 2022 03:54:07 +0000,
Reiji Watanabe <reijiw@xxxxxxxxxx> wrote:
>
> KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs
> for a guest.  At vCPU reset, vcpu_allowed_register_width() checks
> if the vcpu's register width is consistent with all other vCPUs'.
> Since the checking is done even against vCPUs that are not initialized
> (KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs
> are erroneously treated as 64bit vCPU, which causes the function to
> incorrectly detect a mixed-width VM.
>
> Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED
> bits for kvm->arch.flags.  A value of the EL1_32BIT bit indicates that
> the guest needs to be configured with all 32bit or 64bit vCPUs, and
> a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the
> EL1_32BIT bit is valid (already set up). Values in those bits are set at
> the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT
> configuration for the vCPU.
>
> Check vcpu's register width against those new bits at the vcpu's
> KVM_ARM_VCPU_INIT (instead of against other vCPUs' register width).
>
> Fixes: 66e94d5cafd4 ("KVM: arm64: Prevent mixed-width VM creation")
> Signed-off-by: Reiji Watanabe <reijiw@xxxxxxxxxx>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 25 +++++++++++------
>  arch/arm64/include/asm/kvm_host.h    |  8 ++++++
>  arch/arm64/kvm/arm.c                 | 41 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/reset.c               |  8 ------
>  4 files changed, 65 insertions(+), 17 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..f4f960819888 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -20,6 +20,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/cputype.h>
>  #include <asm/virt.h>
> +#include <asm/kvm_mmu.h>

Huh... I wish we didn't drag that one here, it is eventually going to
hurt...

>
>  #define CURRENT_EL_SP_EL0_VECTOR     0x0
>  #define CURRENT_EL_SP_ELx_VECTOR     0x200
> @@ -45,7 +46,14 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
>
>  static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
> -     return !(vcpu->arch.hcr_el2 & HCR_RW);
> +     struct kvm *kvm;
> +
> +     kvm = is_kernel_in_hyp_mode() ? kern_hyp_va(vcpu->kvm) : vcpu->kvm;

Errr... On first approximation, this is the wrong way around. A VHE
kernel doesn't need any repainting of the address, while a nVHE kernel
does. Even more, a bit of context:

static inline bool is_kernel_in_hyp_mode(void)
{
        return read_sysreg(CurrentEL) == CurrentEL_EL2;
}

So not only the expression is the wrong way around, but it *cannot*
distinguish VHE and nVHE when running at EL2. You're just lucky that
the two bugs (on a single line) cancel each others.

The only sane way to write this is to *not* look at the mode you're
running in. kern_hyp_va() is designed to be nop'ed out on VHE.

Actually, I did it knowing kern_hyp_va() was nop on vhe and
kern_hyp_va() was needed for nvhe in hyp.  I wanted to make the
function work whether it is nvhe hyp or non-hyp, or vhe...


> +
> +     WARN_ON_ONCE(!test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED,
> +                            &kvm->arch.flags));
> +
> +     return test_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);
>  }

Given that this is used on the vcpu switch fast path at least twice
per run, we need something better. You probably want to offer
different primitives depending on the context:

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d62405ce3e6d..daea0885c28d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -43,10 +43,22 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);

 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);

+#if defined (__KVM_VHE_HYPERVISOR__) || defined (__KVM_NVHE_HYPERVISOR__)
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
        return !(vcpu->arch.hcr_el2 & HCR_RW);
 }
+#else
+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
+{
+       struct kvm *kvm = kern_hyp_va(vcpu->kvm);
+
+       WARN_ON_ONCE(!test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED,
+                              &kvm->arch_flags));
+
+       return test_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);
+}
+#endif

as you are guaranteed to have configured the width of the vcpu by the
time you hit start messing with it in the context of the hypervisor.

Thank you for the proposal!
I will fix that based on the proposal.


>
>  static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> @@ -72,15 +80,14 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>               vcpu->arch.hcr_el2 |= HCR_TVM;
>       }
>
> -     if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
> +     if (vcpu_el1_is_32bit(vcpu))
>               vcpu->arch.hcr_el2 &= ~HCR_RW;
> -
> -     /*
> -      * TID3: trap feature register accesses that we virtualise.
> -      * For now this is conditional, since no AArch32 feature regs
> -      * are currently virtualised.
> -      */
> -     if (!vcpu_el1_is_32bit(vcpu))
> +     else
> +             /*
> +              * TID3: trap feature register accesses that we virtualise.
> +              * For now this is conditional, since no AArch32 feature regs
> +              * are currently virtualised.
> +              */
>               vcpu->arch.hcr_el2 |= HCR_TID3;
>
>       if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 11a7ae747ded..5cde7f7b5042 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -125,6 +125,14 @@ struct kvm_arch {
>  #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER   0
>       /* Memory Tagging Extension enabled for the guest */
>  #define KVM_ARCH_FLAG_MTE_ENABLED                    1
> +     /*
> +      * The guest's EL1 register width.  A value of KVM_ARCH_FLAG_EL1_32BIT
> +      * bit is valid only when KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED is set.
> +      * Otherwise, the guest's EL1 register width has not yet been
> +      * determined yet.
> +      */
> +#define KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED           2
> +#define KVM_ARCH_FLAG_EL1_32BIT                              3
>       unsigned long flags;
>
>       /*
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 9a2d240ef6a3..9ac75aa46e2f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1101,6 +1101,43 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>       return -EINVAL;
>  }
>
> +/*
> + * A guest can have either all EL1 32bit or 64bit vcpus only. It is
> + * indicated by a value of KVM_ARCH_FLAG_EL1_32BIT bit in kvm->arch.flags,
> + * which is valid only when KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED in
> + * kvm->arch.flags is set.
> + * This function checks if the vCPU's register width configuration is
> + * consistent with a value of the EL1_32BIT bit in kvm->arch.flags
> + * when the REG_WIDTH_CONFIGURED bit is set.
> + * Otherwise, the function sets a value of EL1_32BIT bit based on the vcpu's
> + * KVM_ARM_VCPU_EL1_32BIT configuration (and sets the REG_WIDTH_CONFIGURED
> + * bit of kvm->arch.flags).
> + */
> +static int kvm_register_width_check_or_init(struct kvm_vcpu *vcpu)

The naming is positively Java-esque! How about kvm_set_vm_width()
instead? Also, please document the error code.

Sure, I will fix the name, and document the error code.



> +{
> +     bool is32bit;
> +     bool allowed = true;
> +     struct kvm *kvm = vcpu->kvm;
> +
> +     is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
> +
> +     mutex_lock(&kvm->lock);
> +
> +     if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags)) {
> +             allowed = (is32bit ==
> +                        test_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags));
> +     } else {
> +             if (is32bit)
> +                     set_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);

nit: probably best written as:

                __assign_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags, is32bit);

> +
> +             set_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags);

Since this is only ever set whilst holding the lock, you can user the
__set_bit() version.

Thank you for the proposal. But since other CPUs could attempt
to set other bits without holding the lock, I don't think we
can use the non-atomic version here.


> +     }
> +
> +     mutex_unlock(&kvm->lock);
> +
> +     return allowed ? 0 : -EINVAL;
> +}
> +
>  static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>                              const struct kvm_vcpu_init *init)
>  {
> @@ -1140,6 +1177,10 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>
>       /* Now we know what it is, we can reset it. */
>       ret = kvm_reset_vcpu(vcpu);
> +
> +     if (!ret)
> +             ret = kvm_register_width_check_or_init(vcpu);

Why is that called *after* resetting the vcpu, which itself relies on
KVM_ARM_VCPU_EL1_32BIT, which we agreed to get rid of as much as
possible?

That's because I didn't want to set EL1_32BIT/REG_WIDTH_CONFIGURED
for the guest based on the vCPU for which KVM_ARM_VCPU_INIT would fail.
The flags can be set in the kvm_reset_vcpu() and cleared in
case of failure.  But then that temporary value could lead
KVM_ARM_VCPU_INIT for other vCPUs to fail, which I don't think
is nice to do.

Another option (almost the same though) I considered was as
follows, which just had kvm_reset_vcpu() use the new flag
when available (The following is the diff from the v3 patches).
KVM_ARM_VCPU_EL1_32BIT is used anyway by kvm_reset_vcpu()
and kvm_set_vm_width() though...

---
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 6c5f7677057d..3542eeb48e5d 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -181,11 +181,8 @@ static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
 	return 0;
 }
-static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
+static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu, bool is32bit)
 {
-	bool is32bit;
-
-	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
 	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
 		return false;
@@ -218,14 +215,27 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_reset_state reset_state;
 	int ret;
-	bool loaded;
+	bool loaded, is32bit;
+	bool allowed = true;
 	u32 pstate;
+ is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
+
 	mutex_lock(&vcpu->kvm->lock);
-	reset_state = vcpu->arch.reset_state;
-	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+	if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &vcpu->kvm->arch.flags))
+		allowed = (is32bit == vcpu_el1_is_32bit(vcpu));
+	else
+		allowed = vcpu_allowed_register_width(vcpu, is32bit);
+
+	if (allowed) {
+		reset_state = vcpu->arch.reset_state;
+		WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+	}
 	mutex_unlock(&vcpu->kvm->lock);
+ if (!allowed)
+		return -EINVAL;
+
 	/* Reset PMU outside of the non-preemptible section */
 	kvm_pmu_vcpu_reset(vcpu);
@@ -252,14 +262,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
- if (!vcpu_allowed_register_width(vcpu)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	switch (vcpu->arch.target) {
 	default:
-		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
+		if (is32bit) {
 			pstate = VCPU_RESET_PSTATE_SVC;
 		} else {
 			pstate = VCPU_RESET_PSTATE_EL1;
--

Thanks,
Reiji



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux