On Fri, Sep 02, 2022 at 07:09:39PM -0400, Joel Fernandes wrote: > On 9/2/2022 11:21 AM, Frederic Weisbecker wrote: > >> @@ -3904,7 +3943,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp) > >> rdp->barrier_head.func = rcu_barrier_callback; > >> debug_rcu_head_queue(&rdp->barrier_head); > >> rcu_nocb_lock(rdp); > >> - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies)); > >> + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false, > >> + /* wake gp thread */ true)); > > > > It's a bad sign when you need to start commenting your boolean parameters :) > > Perhaps use a single two-bit flag instead of two booleans, for readability? > > That's fair, what do you mean 2-bit flag? Are you saying, we encode the last 2 > parameters to flush bypass in a u*? Yeah exactly. Such as rcu_nocb_flush_bypass(rdp, NULL, jiffies, FLUSH_LAZY | FLUSH_WAKE) > > > Also that's a subtle change which purpose isn't explained. It means that > > rcu_barrier_entrain() used to wait for the bypass timer in the worst case > > but now we force rcuog into it immediately. Should that be a separate change? > > It could be split, but it is laziness that amplifies the issue so I thought of > keeping it in the same patch. I don't mind one way or the other. Ok then lets keep it here but please add a comment for the reason to force wake here. > >> + case RCU_NOCB_WAKE_BYPASS: > >> + mod_jif = 2; > >> + break; > >> + > >> + case RCU_NOCB_WAKE: > >> + case RCU_NOCB_WAKE_FORCE: > >> + // For these, make it wake up the soonest if we > >> + // were in a bypass or lazy sleep before. > >> if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE) > >> - mod_timer(&rdp_gp->nocb_timer, jiffies + 1); > >> - if (rdp_gp->nocb_defer_wakeup < waketype) > >> - WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); > >> + mod_jif = 1; > >> + break; > >> } > >> > >> + if (mod_jif) > >> + mod_timer(&rdp_gp->nocb_timer, jiffies + mod_jif); > >> + > >> + if (rdp_gp->nocb_defer_wakeup < waketype) > >> + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); > > > > So RCU_NOCB_WAKE_BYPASS and RCU_NOCB_WAKE_LAZY don't override the timer state > > anymore? Looks like something is missing. > > My goal was to make sure that NOCB_WAKE_LAZY wake keeps the timer lazy. If I > don't do this, then when CPU enters idle, it will immediately do a wake up via > this call: > > rcu_nocb_need_deferred_wakeup(rdp_gp, RCU_NOCB_WAKE) But if the timer is in RCU_NOCB_WAKE_LAZY mode, that shouldn't be a problem. > > That was almost always causing lazy CBs to be non-lazy thus negating all the > benefits. > > Note that bypass will also have same issue where the bypass CB will not wait for > intended bypass duration. To make it consistent with lazy, I made bypass also > not override nocb_defer_wakeup. I'm surprised because rcu_nocb_flush_deferred_wakeup() should only do the wake up if the timer is RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE. Or is that code buggy somehow? Actually your change is modifying the timer delay without changing the timer mode, which may shortcut rcu_nocb_flush_deferred_wakeup() check and actually make it perform early upon idle loop entry. Or am I missing something? > > I agree its not pretty, but it works and I could not find any code path where it > does not work. That said, I am open to ideas for changing this and perhaps some > of these unneeded delays with bypass CBs can be split into separate patches. > > Regarding your point about nocb_defer_wakeup state diverging from the timer > programming, that happens anyway here in current code: > > 283 } else { > 284 if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE) > 285 mod_timer(&rdp_gp->nocb_timer, jiffies + 1); > 286 if (rdp_gp->nocb_defer_wakeup < waketype) > 287 WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); > 288 } How so? > >> @@ -705,12 +816,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) > >> my_rdp->nocb_gp_gp = needwait_gp; > >> my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0; > >> > >> - if (bypass && !rcu_nocb_poll) { > >> - // At least one child with non-empty ->nocb_bypass, so set > >> - // timer in order to avoid stranding its callbacks. > >> - wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS, > >> - TPS("WakeBypassIsDeferred")); > >> + // At least one child with non-empty ->nocb_bypass, so set > >> + // timer in order to avoid stranding its callbacks. > >> + if (!rcu_nocb_poll) { > >> + // If bypass list only has lazy CBs. Add a deferred > >> + // lazy wake up. > >> + if (lazy && !bypass) { > >> + wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY, > >> + TPS("WakeLazyIsDeferred")); > > > > What if: > > > > 1) rdp(1) has lazy callbacks > > 2) all other rdp's have no callback at all > > 3) nocb_gp_wait() runs through all rdp's, everything is handled, except for > > these lazy callbacks > > 4) It reaches the above path, ready to arm the RCU_NOCB_WAKE_LAZY timer, > > but it hasn't yet called wake_nocb_gp_defer() > > 5) Oh but rdp(2) queues a non-lazy callback. interrupts are disabled so it defers > > the wake up to nocb_gp_wait() with arming the timer in RCU_NOCB_WAKE. > > 6) nocb_gp_wait() finally calls wake_nocb_gp_defer() and override the timeout > > to several seconds ahead. > > 7) No more callbacks queued, the non-lazy callback will have to wait several > > seconds to complete. > > > > Or did I miss something? > > In theory, I can see this being an issue. In practice, I have not seen it to > be. What matters is that the issue looks plausible. > In my view, the nocb GP thread should not go to sleep in the first place if > there are any non-bypass CBs being queued. If it does, then that seems an > optimization-related bug. Yeah, it's a constraint introduced by the optimized delayed wake up. By why would RCU_NOCB_WAKE_LAZY need to overwrite RCU_NOCB_WAKE in the first place? > > That said, we can make wake_nocb_gp_defer() more robust perhaps by making it not > overwrite the timer if the wake-type requested is weaker than RCU_NOCB_WAKE, > however that should not cause the going-into-idle issues I pointed. Whether the > idle time issue will happen, I have no idea. But in theory, will that address > your concern above? Yes, but I'm still confused by this idle time issue. > > > Note that the race exists with RCU_NOCB_WAKE_BYPASS > > but it's only about one jiffy delay, not seconds. > > Well, 2 jiffies. But yeah. > > Thanks, so far I do not see anything that cannot be fixed on top of this patch > but you raised some good points. Maybe we ought to rewrite the idle path to not > disturb lazy CBs in a different way, or something (while keeping the timer state > consistent with the programming of the timer in wake_nocb_gp_defer()). I'd rather see an updated patch (not the whole patchset but just this one) rather than deltas, just to make sure I'm not missing something in the whole picture. Thanks.