On Wed, Mar 29, 2023 at 06:02:01PM +0200, Frederic Weisbecker wrote: > The shrinker resets the lazy callbacks counter in order to trigger the > pending lazy queue flush though the rcuog kthread. The counter reset is > protected by the ->nocb_lock against concurrent accesses...except > for one of them. Here is a list of existing synchronized readers/writer: > > 1) The first lazy enqueuer (incrementing ->lazy_len to 1) does so under > ->nocb_lock and ->nocb_bypass_lock. > > 2) The further lazy enqueuers (incrementing ->lazy_len above 1) do so > under ->nocb_bypass_lock _only_. > > 3) The lazy flush checks and resets to 0 under ->nocb_lock and > ->nocb_bypass_lock. > > The shrinker protects its ->lazy_len reset against cases 1) and 3) but > not against 2). As such, setting ->lazy_len to 0 under the ->nocb_lock > may be cancelled right away by an overwrite from an enqueuer, leading > rcuog to ignore the flush. > > To avoid that, use the proper bypass flush API which takes care of all > those details. > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> > --- > kernel/rcu/tree_nocb.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index 1a86883902ce..c321fce2af8e 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -1364,7 +1364,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > continue; > > rcu_nocb_lock_irqsave(rdp, flags); > - WRITE_ONCE(rdp->lazy_len, 0); > + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); And I do feel much better about this version. ;-) > rcu_nocb_unlock_irqrestore(rdp, flags); > wake_nocb_gp(rdp, false); > sc->nr_to_scan -= _count; > -- > 2.34.1 >