On Wed, Mar 22, 2023 at 08:44:54PM +0100, Frederic Weisbecker wrote: > The shrinker resets the lazy callbacks counter in order to trigger the > pending lazy queue flush though the rcuog kthread. The counter reset is > protected by the ->nocb_lock against concurrent accesses...except > for one of them. Here is a list of existing synchronized readers/writer: > > 1) The first lazy enqueuer (incrementing ->lazy_len to 1) does so under > ->nocb_lock and ->nocb_bypass_lock. > > 2) The further lazy enqueuers (incrementing ->lazy_len above 1) do so > under ->nocb_bypass_lock _only_. > > 3) The lazy flush checks and resets to 0 under ->nocb_lock and > ->nocb_bypass_lock. > > The shrinker protects its ->lazy_len reset against cases 1) and 3) but > not against 2). As such, setting ->lazy_len to 0 under the ->nocb_lock > may be cancelled right away by an overwrite from an enqueuer, leading > rcuog to ignore the flush. > > To avoid that, use the proper bypass flush API which takes care of all > those details. > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> Again, good catch, and this one looks good to me. So what am I missing? ;-) Thanx, Paul > --- > kernel/rcu/tree_nocb.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index dd9b655ae533..cb57e8312231 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -1356,7 +1356,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > continue; > > rcu_nocb_lock_irqsave(rdp, flags); > - WRITE_ONCE(rdp->lazy_len, 0); > + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); > rcu_nocb_unlock_irqrestore(rdp, flags); > wake_nocb_gp(rdp, false); > sc->nr_to_scan -= _count; > -- > 2.34.1 >