On Wed, Mar 22, 2023 at 08:44:53PM +0100, Frederic Weisbecker wrote: > The shrinker may run concurrently with callbacks (de-)offloading. As > such, calling rcu_nocb_lock() is very dangerous because it does a > conditional locking. The worst outcome is that rcu_nocb_lock() doesn't > lock but rcu_nocb_unlock() eventually unlocks, or the reverse, creating > an imbalance. > > Fix this with protecting against (de-)offloading using the barrier mutex. > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> Good catch!!! A few questions, comments, and speculations below. Thanx, Paul > --- > kernel/rcu/tree_nocb.h | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index f2280616f9d5..dd9b655ae533 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -1336,13 +1336,25 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > unsigned long flags; > unsigned long count = 0; > > + /* > + * Protect against concurrent (de-)offloading. Otherwise nocb locking > + * may be ignored or imbalanced. > + */ > + mutex_lock(&rcu_state.barrier_mutex); I was worried about this possibly leading to out-of-memory deadlock, but if I recall correctly, the (de-)offloading process never allocates memory, so this should be OK? The other concern was that the (de-)offloading operation might take a long time, but the usual cause for that is huge numbers of callbacks, in which case letting them free their memory is not necessarily a bad strategy. > + > /* Snapshot count of all CPUs */ > for_each_possible_cpu(cpu) { > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > - int _count = READ_ONCE(rdp->lazy_len); > + int _count; > + > + if (!rcu_rdp_is_offloaded(rdp)) > + continue; If the CPU is offloaded, isn't ->lazy_len guaranteed to be zero? Or can it contain garbage after a de-offloading operation? > + _count = READ_ONCE(rdp->lazy_len); > > if (_count == 0) > continue; > + > rcu_nocb_lock_irqsave(rdp, flags); > WRITE_ONCE(rdp->lazy_len, 0); > rcu_nocb_unlock_irqrestore(rdp, flags); > @@ -1352,6 +1364,9 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > if (sc->nr_to_scan <= 0) > break; > } > + > + mutex_unlock(&rcu_state.barrier_mutex); > + > return count ? count : SHRINK_STOP; > } > > -- > 2.34.1 >