On Sat, Jun 25, 2022 at 09:06:22PM -0700, Paul E. McKenney wrote: > On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote: > > We notice that rcu_barrier() can take a really long time. It appears > > that this can happen when all CBs are lazy and the timer does not fire > > yet. So after flushing, nothing wakes up GP thread. This patch forces > > GP thread to wake when bypass flushing happens, this fixes the > > rcu_barrier() delays with lazy CBs. > > I am wondering if there is a bug in non-rcu_barrier() lazy callback > processing hiding here as well? I don't think so because in both nocb_try_bypass and nocb_gp_wait, we are not going to an indefinite sleep after the flush. However, with rcu_barrier() , there is nothing to keep the RCU GP thread awake. That's my theory at least. In practice, I have not been able to reproduce this issue with non-rcu_barrier(). With rcu_barrier() I happen to hit it thanks to the rcuscale changes I did. That's an interesting story! As I apply call_rcu_lazy() to the file table code, turns out that on boot, the initram unpacking code continously triggers call_rcu_lazy(). This happens apparently in a different thread than the one that rcuscale is running in. In rcuscale, I did rcu_barrier() at init time and this stalled for a long time to my surprise, and this patch fixed it. thanks, - Joel > > Thanx, Paul > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > > --- > > kernel/rcu/tree_nocb.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > index 2f5da12811a5..b481f1ea57c0 100644 > > --- a/kernel/rcu/tree_nocb.h > > +++ b/kernel/rcu/tree_nocb.h > > @@ -325,6 +325,8 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, > > rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl); > > WRITE_ONCE(rdp->nocb_bypass_first, j); > > rcu_nocb_bypass_unlock(rdp); > > + > > + wake_nocb_gp(rdp, true); > > return true; > > } > > > > -- > > 2.37.0.rc0.104.g0611611a94-goog > >