On Sun, Jun 26, 2022 at 06:52:40AM -0700, Paul E. McKenney wrote: > On Sun, Jun 26, 2022 at 01:45:32PM +0000, Joel Fernandes wrote: > > On Sat, Jun 25, 2022 at 09:06:22PM -0700, Paul E. McKenney wrote: > > > On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote: > > > > We notice that rcu_barrier() can take a really long time. It appears > > > > that this can happen when all CBs are lazy and the timer does not fire > > > > yet. So after flushing, nothing wakes up GP thread. This patch forces > > > > GP thread to wake when bypass flushing happens, this fixes the > > > > rcu_barrier() delays with lazy CBs. > > > > > > I am wondering if there is a bug in non-rcu_barrier() lazy callback > > > processing hiding here as well? > > > > I don't think so because in both nocb_try_bypass and nocb_gp_wait, we are not > > going to an indefinite sleep after the flush. However, with rcu_barrier() , > > there is nothing to keep the RCU GP thread awake. That's my theory at least. > > In practice, I have not been able to reproduce this issue with > > non-rcu_barrier(). > > > > With rcu_barrier() I happen to hit it thanks to the rcuscale changes I did. > > That's an interesting story! As I apply call_rcu_lazy() to the file table > > code, turns out that on boot, the initram unpacking code continously triggers > > call_rcu_lazy(). This happens apparently in a different thread than the one > > that rcuscale is running in. In rcuscale, I did rcu_barrier() at init time > > and this stalled for a long time to my surprise, and this patch fixed it. > > Cool! > > Then should this wake_nocb_gp() instead go into the rcu_barrier() > code path? As shown below, wouldn't we be doing some spurious wakeups? You are right. In my testing, I don't see any issue with the extra wake up which is going to happen anyway and my thought was why not do it so that a future bypass flush from some other path forgets to call wake up. I'll refine it to be rcu-barrier-only then. thanks! - Joel