> On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote: > > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote: > > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@xxxxxxxxx> wrote: > > > > > > > > > > > From: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > > > > > > > > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu" > > > > > > func, if the shrinker is triggered at this time, we should drain each > > > > > > possible cpu "krcp". > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > > > > > > --- > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > > > > > > unsigned long count = 0; > > > > > > > > > > > > /* Snapshot count of all CPUs */ > > > > > > - for_each_online_cpu(cpu) { > > > > > > + for_each_possible_cpu(cpu) { > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > > > > > > > > > > > count += READ_ONCE(krcp->count); > > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > > > > > > int cpu, freed = 0; > > > > > > unsigned long flags; > > > > > > > > > > > > - for_each_online_cpu(cpu) { > > > > > > + for_each_possible_cpu(cpu) { > > > > > > int count; > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > > > > > > > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void) > > > > > > int cpu; > > > > > > unsigned long flags; > > > > > > > > > > > > - for_each_online_cpu(cpu) { > > > > > > + for_each_possible_cpu(cpu) { > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > > > > > > > > > > > raw_spin_lock_irqsave(&krcp->lock, flags); > > > > > > > > > > > I agree that it can happen. > > > > > > > > > > Joel, what is your view? > > > > > > > > Yes I also think it is possible. The patch LGTM. Another fix could be > > > > to drain the caches in the CPU offline path and save the memory. But > > > > then it will take hit during __get_free_page(). If CPU > > > > offlining/online is not frequent, then it will save the lost memory. > > > > > > > > I wonder how other per-cpu caches in the kernel work in such scenarios. > > > > > > > > Thoughts? > > > > > > Do I count this as an ack or a review? If not, what precisely would > > > you like the submitter to do differently? > > > > Hi Paul, > > The patch is correct and is definitely an improvement. I was thinking > > about whether we should always do what the patch is doing when > > offlining CPUs to save memory but now I feel that may not be that much > > of a win to justify more complexity. > > > > You can take it with my ack: > > > > Acked-by: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > Thank you all! I wordsmithed a bit as shown below, so please let > me know if I messed anything up. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit fe5d89cc025b3efe682cac122bc4d39f4722821e > Author: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > Date: Fri Aug 14 14:45:57 2020 +0800 > > rcu: Shrink each possible cpu krcp > > CPUs can go offline shortly after kfree_call_rcu() has been invoked, > which can leave memory stranded until those CPUs come back online. > This commit therefore drains the kcrp of each CPU, not just the > ones that happen to be online. > > Acked-by: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > Signed-off-by: Zqiang <qiang.zhang@xxxxxxxxxxxxx> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 02ca8e5..d9f90f6 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) > unsigned long count = 0; > > /* Snapshot count of all CPUs */ > - for_each_online_cpu(cpu) { > + for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > count += READ_ONCE(krcp->count); > @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > int cpu, freed = 0; > unsigned long flags; > > - for_each_online_cpu(cpu) { > + for_each_possible_cpu(cpu) { > int count; > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void) > int cpu; > unsigned long flags; > > - for_each_online_cpu(cpu) { > + for_each_possible_cpu(cpu) { > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); > > raw_spin_lock_irqsave(&krcp->lock, flags); > Should we just clean a krc of a CPU when it goes offline? diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b8ccd7b5af82..6decb9ad2421 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) { struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ + struct kfree_rcu_cpu *krcp; if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) return 0; + /* Drain the kcrp of this CPU. IRQs should be disabled? */ + krcp = this_cpu_ptr(&krc) + schedule_delayed_work(&krcp->monitor_work, 0); + A cpu can be offlined and its krp will be stuck until a shrinker is involved. Maybe be never. -- Vlad Rezki