On (21/05/17 09:23), Paul E. McKenney wrote: > On Sun, May 16, 2021 at 07:27:16PM +0900, Sergey Senozhatsky wrote: > > Soft watchdog timer function checks if a virtual machine > > was suspended and hence what looks like a lockup in fact > > is a false positive. > > > > This is what kvm_check_and_clear_guest_paused() does: it > > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) > > and if it's set then we need to touch all watchdogs and bail > > out. > > > > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED > > check works fine. > > > > There is, however, one more watchdog that runs from IRQ, so > > watchdog timer fn races with it, and that watchdog is not aware > > of PVCLOCK_GUEST_STOPPED - RCU stall detector. > > > > apic_timer_interrupt() > > smp_apic_timer_interrupt() > > hrtimer_interrupt() > > __hrtimer_run_queues() > > tick_sched_timer() > > tick_sched_handle() > > update_process_times() > > rcu_sched_clock_irq() > > > > This triggers RCU stalls on our devices during VM resume. > > > > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU > > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() > > then there is nothing on this VCPU that touches watchdogs and > > RCU reads stale gp stall timestamp and new jiffies value, which > > makes it think that RCU has stalled. > > > > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and > > don't report RCU stalls when we resume the VM. > > Good point! > > But if I understand your patch correctly, if the virtual machine is > stopped at any point during a grace period, even if only for a short time, > stall warnings are suppressed for that grace period forever, courtesy of > the call to rcu_cpu_stall_reset(). So, if something really is stalling, > and the virtual machine just happens to stop for a few milliseconds, the > stall warning is completely suppressed. Which would make it difficult > to debug the underlying stall condition. > > Is it possible to provide RCU with information on the duration of the > virtual-machine stoppage so that RCU could adjust the timing of the > stall warning? Maybe by having something like rcu_cpu_stall_reset() > that takes the duration of the stoppage in jiffies? Good questions! And I think I've some bad news and some good news. As far as I can tell, none of the PVCLOCK_GUEST_STOPPED handlers take the stoppage duration into consideration. For instance, as soon as watchdog timer IRQ detects a potential softlockup it checks PVCLOCK_GUEST_STOPPED and touches all watchdogs, including RCU: watchdog_timer_fn() kvm_check_and_clear_guest_paused() pvclock_touch_watchdogs() rcu_cpu_stall_reset() // + the remaining watchdogs But things get more complex. pvclock_clocksource_read() also checks PVCLOCK_GUEST_STOPPED and calls pvclock_touch_watchdogs(). And this path is executed rather often. For instance, apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() hrtimer_wakeup() try_to_wake_up() update_rq_clock() sched_clock_cpu() sched_clock() kvm_sched_clock_read() kvm_clock_read() pvclock_clocksource_read() pvclock_touch_watchdogs() rcu_cpu_stall_reset() // + the remaining watchdogs Or do_IRQ irq_exit sched_clock_cpu sched_clock kvm_sched_clock_read kvm_clock_read pvclock_clocksource_read pvclock_touch_watchdogs rcu_cpu_stall_reset() // + the remaining watchdogs And so on... You may wonder what are the good news then. Well. I'd say that my patch (is not beautiful but it) does not add a lot of additional or new damage. And it still fixes the valid race condition, as far as I'm concerned. I think we need to rework how pvclock_touch_watchdogs() does things internally, basically what you suggested, and this can be a separate effort.