On Fri, Jul 16, 2021 at 03:23:07PM +0900, Sergey Senozhatsky wrote: > On (21/07/16 14:41), Sergey Senozhatsky wrote: > > @@ -657,6 +657,13 @@ static void check_cpu_stall(struct rcu_data *rdp) > > unsigned long js; > > struct rcu_node *rnp; > > > > + /* > > + * If a virtual machine is stopped by the host it can look to > > + * the watchdog like an RCU stall. Check to see if the host > > + * stopped the vm. > > + */ > > + kvm_check_and_clear_guest_paused(); > > + > > lockdep_assert_irqs_disabled(); > > if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) || > > !rcu_gp_in_progress()) > > @@ -699,14 +706,6 @@ static void check_cpu_stall(struct rcu_data *rdp) > > (READ_ONCE(rnp->qsmask) & rdp->grpmask) && > > cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > > > - /* > > - * If a virtual machine is stopped by the host it can look to > > - * the watchdog like an RCU stall. Check to see if the host > > - * stopped the vm. > > - */ > > - if (kvm_check_and_clear_guest_paused()) > > - return; > > - > > /* We haven't checked in, so go dump stack. */ > > print_cpu_stall(gps); > > if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) > > @@ -717,14 +716,6 @@ static void check_cpu_stall(struct rcu_data *rdp) > > ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && > > cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > > > - /* > > - * If a virtual machine is stopped by the host it can look to > > - * the watchdog like an RCU stall. Check to see if the host > > - * stopped the vm. > > - */ > > - if (kvm_check_and_clear_guest_paused()) > > - return; > > - > > /* They had a few time units to dump stack, so complain. */ > > print_other_cpu_stall(gs2, gps); > > if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) > > This patch depends on > https://lore.kernel.org/lkml/20210716053405.1243239-1-senozhatsky@xxxxxxxxxxxx/ > > If that x86/kvm patch lands, then we need to handle > PVCLOCK_GUEST_STOPPED in watchdogs. OK, please let me know how and when you would like to proceed. > In theory, this patch opens a small race window, if the VCPU gets preempted > after kvm_check_and_clear_guest_paused() (external interrupt, etc.) > But it's hard to tell how likely the problem is. There is always attempting to provoke it, possibly accompanied by artificially widening the race window. Thanx, Paul