When host suspends a VM it may signal the guest that its being suspended via KVM_KVMCLOCK_CTRL ioctl, so that once resumed guest VCPUs can discover PVCLOCK_GUEST_STOPPED bit and touch watchdogs to update stale timeouts. The way it's implemented is that every kvm_clock_read() calls pvclock_clocksource_read(), which tests PVCLOCK_GUEST_STOPPED and invokes pvclock_touch_watchdogs() when needed. This scheme appears not to be always working as intended. For instance, when lockdep is enabled we have the following: <IRQ> apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() _raw_spin_lock_irqsave() lock_acquire() __lock_acquire() sched_clock_cpu() sched_clock() kvm_sched_clock_read() kvm_clock_read() pvclock_clocksource_read() pvclock_touch_watchdogs() Since this is VM and VCPU resume path, jiffies still maybe be outdated here, which is often the case on my device. pvclock_clocksource_read() clears PVCLOCK_GUEST_STOPPED, touches watchdogs, but it uses stale jiffies: 4294740764 (for example). Now comes in the sched tick IRQ, which invokes RCU watchdog: <IRQ> apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() At this point, however, jiffies are already updated and include the VM suspension time (approx 80 seconds in this case): 4294819216, but PVCLOCK_GUEST_STOPPED is already cleared and we used outdated jiffies, so RCU watchdog concludes it's a stall. There are probably more scenarios under which resuming VCPUs can invoke kvm_clock_read() too early. Both lockup and RCU watchdogs call kvm_check_and_clear_guest_paused() from hard IRQ contexts, so we probably can remove one from pvclock_clocksource_read() and avoid preliminary PVCLOCK_GUEST_STOPPED handling from some random paths. That is, since kvm_check_and_clear_guest_paused() is for watchdogs then only watchdogs should use it. Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> --- arch/x86/kernel/kvmclock.c | 2 +- arch/x86/kernel/pvclock.c | 5 ----- 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ad273e5861c1..af90b889e923 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -150,8 +150,8 @@ bool kvm_check_and_clear_guest_paused(void) return ret; if ((src->pvti.flags & PVCLOCK_GUEST_STOPPED) != 0) { - src->pvti.flags &= ~PVCLOCK_GUEST_STOPPED; pvclock_touch_watchdogs(); + src->pvti.flags &= ~PVCLOCK_GUEST_STOPPED; ret = true; } return ret; diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index eda37df016f0..b176e083e543 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -77,11 +77,6 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) flags = src->flags; } while (pvclock_read_retry(src, version)); - if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) { - src->flags &= ~PVCLOCK_GUEST_STOPPED; - pvclock_touch_watchdogs(); - } - if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) && (flags & PVCLOCK_TSC_STABLE_BIT)) return ret; -- 2.32.0.402.g57bb445576-goog