On Thu, Jul 15, 2021 at 06:09:45PM +0900, Sergey Senozhatsky wrote: > On (21/05/22 00:56), Sergey Senozhatsky wrote: > > Soft watchdog timer function checks if a virtual machine > > was suspended and hence what looks like a lockup in fact > > is a false positive. > > > > This is what kvm_check_and_clear_guest_paused() does: it > > tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) > > and if it's set then we need to touch all watchdogs and bail > > out. > > > > Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED > > check works fine. > > > > There is, however, one more watchdog that runs from IRQ, so > > watchdog timer fn races with it, and that watchdog is not aware > > of PVCLOCK_GUEST_STOPPED - RCU stall detector. > > > > apic_timer_interrupt() > > smp_apic_timer_interrupt() > > hrtimer_interrupt() > > __hrtimer_run_queues() > > tick_sched_timer() > > tick_sched_handle() > > update_process_times() > > rcu_sched_clock_irq() > > > > This triggers RCU stalls on our devices during VM resume. > > > > If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU > > before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() > > then there is nothing on this VCPU that touches watchdogs and > > RCU reads stale gp stall timestamp and new jiffies value, which > > makes it think that RCU has stalled. > > > > Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and > > don't report RCU stalls when we resume the VM. > > Hello Paul, > > I've noticed that this patch set didn't make it to Linus's tree. > Was it intentional? This patch (and the 18 preceding it) didn't make the cutoff for the just-past merge window. If this patch is urgent, please let me know and I can push it, with luck by the end of next week. If that one is urgent, are these two also? 817690fd18af ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") 9ed9bf0d17cd ("rcu: Start timing stall repetitions after warning complete") If so, it is better to handle them as a group than separately. The cutoff for a given merge window is normally shortly after the close of the previous merge window. This time, I am a bit slow creating branches, but the cutoff for the v5.15 merge window should be by the end of the week. This is a bit more lag than most subsystems, but this is after all RCU. As always, if a given commit is urgent, please let me know and I will see what I can do to fast-track it. For reference: https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html Again, if this one needs to hit mainline before the v5.15 merge window, please let me know. Thanx, Paul