On Tue, 11 Jan 2022 15:35:36 +0000, Mark Rutland <mark.rutland@xxxxxxx> wrote: > > In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state > (EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to > exiting the EQS by calling guest_exit(). As the IRQ entry code will not > wake RCU in this case, we may run the core IRQ code and IRQ handler > without RCU watching, leading to various potential problems. > > Additionally, we do not inform lockdep or tracing that interrupts will > be enabled during guest execution, which caan lead to misleading traces > and warnings that interrupts have been enabled for overly-long periods. > > This patch fixes these issues by using the new timing and context > entry/exit helpers to ensure that interrupts are handled during guest > vtime but with RCU watching, with a sequence: > > guest_timing_enter_irqoff(); > > exit_to_guest_mode(); > < run the vcpu > > enter_from_guest_mode(); > > < take any pending IRQs > > > guest_timing_exit_irqoff(); > > Since instrumentation may make use of RCU, we must also ensure that no > instrumented code is run during the EQS. I've split out the critical > section into a new kvm_arm_enter_exit_vcpu() helper which is marked > noinstr. > > Fixes: 1b3d546daf85ed2b ("arm/arm64: KVM: Properly account for guest CPU time") > Reported-by: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Alexandru Elisei <alexandru.elisei@xxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > Cc: James Morse <james.morse@xxxxxxx> > Cc: Marc Zyngier <maz@xxxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx> > Cc: Suzuki K Poulose <suzuki.poulose@xxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > --- > arch/arm64/kvm/arm.c | 51 ++++++++++++++++++++++++++++---------------- > 1 file changed, 33 insertions(+), 18 deletions(-) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index e4727dc771bf..1721df2522c8 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -764,6 +764,24 @@ static bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu, int *ret) > xfer_to_guest_mode_work_pending(); > } > > +/* > + * Actually run the vCPU, entering an RCU extended quiescent state (EQS) while > + * the vCPU is running. > + * > + * This must be noinstr as instrumentation may make use of RCU, and this is not > + * safe during the EQS. > + */ > +static int noinstr kvm_arm_vcpu_enter_exit(struct kvm_vcpu *vcpu) > +{ > + int ret; > + > + exit_to_guest_mode(); > + ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu); > + enter_from_guest_mode(); > + > + return ret; > +} > + > /** > * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code > * @vcpu: The VCPU pointer > @@ -854,9 +872,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > * Enter the guest > */ > trace_kvm_entry(*vcpu_pc(vcpu)); > - guest_enter_irqoff(); > + guest_timing_enter_irqoff(); > > - ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu); > + ret = kvm_arm_vcpu_enter_exit(vcpu); > > vcpu->mode = OUTSIDE_GUEST_MODE; > vcpu->stat.exits++; > @@ -891,26 +909,23 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > kvm_arch_vcpu_ctxsync_fp(vcpu); > > /* > - * We may have taken a host interrupt in HYP mode (ie > - * while executing the guest). This interrupt is still > - * pending, as we haven't serviced it yet! > + * We must ensure that any pending interrupts are taken before > + * we exit guest timing so that timer ticks are accounted as > + * guest time. Transiently unmask interrupts so that any > + * pending interrupts are taken. > * > - * We're now back in SVC mode, with interrupts > - * disabled. Enabling the interrupts now will have > - * the effect of taking the interrupt again, in SVC > - * mode this time. > + * Per ARM DDI 0487G.b section D1.13.4, an ISB (or other > + * context synchronization event) is necessary to ensure that > + * pending interrupts are taken. > */ > local_irq_enable(); > + isb(); > + local_irq_disable(); Small nit: we may be able to elide this enable/isb/disable dance if a read of ISR_EL1 returns 0. > + > + guest_timing_exit_irqoff(); > + > + local_irq_enable(); > > - /* > - * We do local_irq_enable() before calling guest_exit() so > - * that if a timer interrupt hits while running the guest we > - * account that tick as being spent in the guest. We enable > - * preemption after calling guest_exit() so that if we get > - * preempted we make sure ticks after that is not counted as > - * guest time. > - */ > - guest_exit(); > trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); > > /* Exit types that need handling before we can be preempted */ Reviewed-by: Marc Zyngier <maz@xxxxxxxxxx> M. -- Without deviation from the norm, progress is not possible.