On Fri, May 21, 2021 at 07:24:03AM +0900, Sergey Senozhatsky wrote: > On (21/05/20 07:53), Paul E. McKenney wrote: > > On Thu, May 20, 2021 at 03:18:07PM +0900, Sergey Senozhatsky wrote: > > > On (21/05/18 16:15), Paul E. McKenney wrote: > > > > > > > > In the shorter term... PVCLOCK_GUEST_STOPPED is mostly for things like > > > > guest migration and debugger breakpoints, correct? Either way, I am > > > > wondering if rcu_cpu_stall_reset() should take a lighter touch. Right > > > > now, it effectively disables all stalls for the current grace period. > > > > Why not make it restart the stall timeout when the stoppage is detected? > > > > > > rcu_cpu_stall_reset() is used in many other places, not sure if we can > > > change its behaviour rcu_cpu_stall_reset(). > > > > There was some use case back in the day where they wanted an indefinite > > suppression of RCU CPU stall warnings for the current grace period, but > > all the current use cases look fine with restarting the stall timeout. > > > > However, please see below. > > > > > Maybe it'll be possible to just stop calling it from PV-clock and do > > > something like this > > > > This was in fact one of the things I was considering, at least until > > I convinced myself that I needed to ask some questions. > > > > One point of possibly unnecessary nervousness on my part is resetting > > the start of the grace period, which might confuse diagnostics. > > > > But how about something like this? > > > > void rcu_cpu_stall_reset(void) > > { > > WRITE_ONCE(rcu_state.jiffies_stall, > > jiffies + rcu_jiffies_till_stall_check()); > > } > > > > Would something like that work? > > This should work. > > On a side note. > > I wish we didn't have to put kvm_check_and_clear_guest_paused() all > over the place. > > We do load jiffies at the start of check_cpu_stall(). So, in theory, > we can just use that captured jiffies (which can become obsolete but > so will grace period timestamps) value and never read current system > jiffies because they can jump way forward. IOW > > jn = j + 3 * rcu_jiffies_till_stall_check() + 3; > > instead of > > jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3; > > Then we probably can remove kvm_check_and_clear_guest_paused(). > > But that "don't load current jiffies" is rather fragile. > > kvm_check_and_clear_guest_paused() is not pretty, but at least it's > explicit. If this works for you, I am very much in favor! Thanx, Paul > --- > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index 49dda86a0e84..24f749bc1f90 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -695,19 +695,11 @@ static void check_cpu_stall(struct rcu_data *rdp) > ULONG_CMP_GE(gps, js)) > return; /* No stall or GP completed since entering function. */ > rnp = rdp->mynode; > - jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3; > + jn = j + 3 * rcu_jiffies_till_stall_check() + 3; > if (rcu_gp_in_progress() && > (READ_ONCE(rnp->qsmask) & rdp->grpmask) && > cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > - /* > - * If a virtual machine is stopped by the host it can look to > - * the watchdog like an RCU stall. Check to see if the host > - * stopped the vm. > - */ > - if (kvm_check_and_clear_guest_paused()) > - return; > - > /* We haven't checked in, so go dump stack. */ > print_cpu_stall(gps); > if (READ_ONCE(rcu_cpu_stall_ftrace_dump)) > @@ -717,14 +709,6 @@ static void check_cpu_stall(struct rcu_data *rdp) > ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && > cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > - /* > - * If a virtual machine is stopped by the host it can look to > - * the watchdog like an RCU stall. Check to see if the host > - * stopped the vm. > - */ > - if (kvm_check_and_clear_guest_paused()) > - return; > - > /* They had a few time units to dump stack, so complain. */ > print_other_cpu_stall(gs2, gps); > if (READ_ONCE(rcu_cpu_stall_ftrace_dump))