When a guest gets scheduled, KVM performs a "load" operation, which for the timer includes evaluating the virtual "active" state of the interrupt, and replicating it on the physical side. This ensures that the deactivation in the guest will also take place in the physical GIC distributor. If the interrupt is not yet active, we flag it as inactive on the physical side. This means that on restoring the timer registers, if the timer has expired, we'll immediately take an interrupt. That's absolutely fine, as the interrupt will then be flagged as active on the physical side. What this assumes though is that we'll enter the guest right after having taken the interrupt, and that the guest will quickly ACK the interrupt, making it active at on the virtual side. It turns out that quite often, this assumption doesn't really hold. The guest may be preempted on the back on this interrupt, either from kernel space or whilst running at EL1 when a host interrupt fires. When this happens, we repeat the whole sequence on the next load (interrupt marked as inactive, timer registers restored, interrupt fires). And if it takes a really long time for a guest to activate the interrupt (as it does with nested virt), we end-up with many such events in quick succession, leading to the guest only making very slow progress. This can also be seen with the number of virtual timer interrupt on the host being far greater than the same number in the guest. An easy way to fix this is to evaluate the timer state when performing the "load" operation, just like we do when the interrupt actually fires. If the timer has a pending virtual interrupt at this stage, then we can safely flag the physical interrupt as being active, which prevents spurious exits. Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> --- virt/kvm/arm/arch_timer.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index 17f9de73cc8a..af8f2f1d01cc 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx) { struct kvm_vcpu *vcpu = ctx->vcpu; - bool phys_active; + bool phys_active = false; + + /* + * Update the timer output so that it is likely to match the + * state we're about to restore. If the timer expires between + * this point and the register restoration, we'll take the + * interrupt anyway. + */ + kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx); if (irqchip_in_kernel(vcpu->kvm)) phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq); - else - phys_active = ctx->irq.level; + + phys_active |= ctx->irq.level; + set_timer_irq_phys_active(ctx, phys_active); } -- 2.20.1