On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote: > > > > > > ________________________________________ > > 发件人: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-owner@xxxxxxxxxxxxxxx> 代表 Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > 发送时间: 2020年8月19日 8:04 > > 收件人: Paul E. McKenney > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > { > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > return 0; > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > + krcp = this_cpu_ptr(&krc) > > > > + schedule_delayed_work(&krcp->monitor_work, 0); > > > > + > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved. > > > > Maybe be never. > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > > hard time getting too worried about it. ;-) > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > >thanks, > > > > > >- Joel > > > > When cpu going offline, the slub or slab only flush free objects in offline > > cpu cache, put these free objects in node list or return buddy system, > > for those who are still in use, they still stay offline cpu cache. > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be > > called in "cpuhp/%u" per-cpu thread. > > > > Could you please wrap text properly when you post to mailing list, thanks. I > fixed it for you above. > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > unsigned long flags; > > struct rcu_data *rdp; > > struct rcu_node *rnp; > > + struct kfree_rcu_cpu *krcp; > > > > rdp = per_cpu_ptr(&rcu_data, cpu); > > rnp = rdp->mynode; > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > // nohz_full CPUs need the tick for stop-machine to work quickly > > tick_dep_set(TICK_DEP_BIT_RCU); > > + > > + krcp = per_cpu_ptr(&krc, cpu); > > + raw_spin_lock_irqsave(&krcp->lock, flags); > > + schedule_delayed_work(&krcp->monitor_work, 0); > > + raw_spin_unlock_irqrestore(&krcp->lock, flags); > > return 0; > > I realized the above is not good enough for what this is trying to do. Unlike > the slab, the new kfree_rcu objects cannot always be drained / submitted to > RCU because the previous batch may still be waiting for a grace period. So > the above code could very well return with the yet-to-be-submitted kfree_rcu > objects still in the cache. > > One option is to spin-wait here for monitor_todo to be false and keep calling > kfree_rcu_drain_unlock() till then. > > But then that's not good enough either, because if new objects are queued > when interrupts are enabled in the CPU offline path, then the cache will get > new objects after the previous set was drained. Further, spin waiting may > introduce deadlocks. > > Another option is to switch the kfree_rcu() path to non-batching (so new > objects cannot be cached in the offline path and are submitted directly to > RCU), wait for a GP and then submit the work. But then not sure if 1-argument > kfree_rcu() will like that. Or spawn a workqueue that does something like this: 1. Get any pending kvfree_rcu() requests sent off to RCU. 2. Do an rcu_barrier(). 3. Do the cleanup actions. > Probably Qian's original fix for for_each_possible_cpus() is good enough for > the shrinker case, and then we can tackle the hotplug one. It might take some experimentation to find the best solution. Thanx, Paul