Hi Marc, On 5/10/21 10:49 AM, Marc Zyngier wrote: > KVM currently updates PC (and the corresponding exception state) > using a two phase approach: first by setting a set of flags, > then by converting these flags into a state update when the vcpu > is about to enter the guest. > > However, this creates a disconnect with userspace if the vcpu thread > returns there with any exception/PC flag set. In this case, the exposed The code seems to handle only the KVM_ARM64_PENDING_EXCEPTION flag. Is the "PC flag" a reference to the KVM_ARM64_INCREMENT_PC flag? > context is wrong, as userpsace doesn't have access to these flags s/userpsace/userspace > (they aren't architectural). It also means that these flags are > preserved across a reset, which isn't expected. > > To solve this problem, force an explicit synchronisation of the > exception state on vcpu exit to userspace. As an optimisation > for nVHE systems, only perform this when there is something pending. > > Reported-by: Zenghui Yu <yuzenghui@xxxxxxxxxx> > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # 5.11 > --- > arch/arm64/include/asm/kvm_asm.h | 1 + > arch/arm64/kvm/arm.c | 10 ++++++++++ > arch/arm64/kvm/hyp/exception.c | 4 ++-- > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++ > 4 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h > index d5b11037401d..5e9b33cbac51 100644 > --- a/arch/arm64/include/asm/kvm_asm.h > +++ b/arch/arm64/include/asm/kvm_asm.h > @@ -63,6 +63,7 @@ > #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector 18 > #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize 19 > #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp 20 > +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 21 > > #ifndef __ASSEMBLY__ > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 1cb39c0803a4..d62a7041ebd1 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > > kvm_sigset_deactivate(vcpu); > > + /* > + * In the unlikely event that we are returning to userspace > + * with pending exceptions or PC adjustment, commit these I'm going to assume "PC adjustment" means the KVM_ARM64_INCREMENT_PC flag. Please correct me if that's not true, but if that's the case, then the flag isn't handled below. > + * adjustments in order to give userspace a consistent view of > + * the vcpu state. > + */ > + if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION | > + KVM_ARM64_EXCEPT_MASK))) The condition seems to suggest that it is valid to set KVM_ARM64_EXCEPT_{AA32,AA64}_* without setting KVM_ARM64_PENDING_EXCEPTION, which looks rather odd to me. Is that a valid use of the KVM_ARM64_EXCEPT_MASK bits? If it's not (the existing code always sets the exception type with the KVM_ARM64_PENDING_EXCEPTION), that I was thinking that checking only the KVM_ARM64_PENDING_EXCEPTION flag would make the intention clearer. Thanks, Alex > + kvm_call_hyp(__kvm_adjust_pc, vcpu); > + > vcpu_put(vcpu); > return ret; > } > diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c > index 0812a496725f..11541b94b328 100644 > --- a/arch/arm64/kvm/hyp/exception.c > +++ b/arch/arm64/kvm/hyp/exception.c > @@ -331,8 +331,8 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu) > } > > /* > - * Adjust the guest PC on entry, depending on flags provided by EL1 > - * for the purpose of emulation (MMIO, sysreg) or exception injection. > + * Adjust the guest PC (and potentially exception state) depending on > + * flags provided by the emulation code. > */ > void __kvm_adjust_pc(struct kvm_vcpu *vcpu) > { > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c > index f36420a80474..1632f001f4ed 100644 > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c > @@ -28,6 +28,13 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt) > cpu_reg(host_ctxt, 1) = __kvm_vcpu_run(kern_hyp_va(vcpu)); > } > > +static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt) > +{ > + DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1); > + > + __kvm_adjust_pc(kern_hyp_va(vcpu)); > +} > + > static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt) > { > __kvm_flush_vm_context(); > @@ -170,6 +177,7 @@ typedef void (*hcall_t)(struct kvm_cpu_context *); > > static const hcall_t host_hcall[] = { > HANDLE_FUNC(__kvm_vcpu_run), > + HANDLE_FUNC(__kvm_adjust_pc), > HANDLE_FUNC(__kvm_flush_vm_context), > HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa), > HANDLE_FUNC(__kvm_tlb_flush_vmid),