On Tue, Apr 04, 2023 at 09:42:06AM +0800, Ye Bin wrote: > From: Ye Bin <yebin10@xxxxxxxxxx> > > In commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") a race > condition between a cpu dying and percpu_counter_sum() iterating online CPUs > was identified. > Acctually, there's the same race condition between a cpu dying and > __percpu_counter_compare(). Here, use 'num_online_cpus()' for quick judgment. > But 'num_online_cpus()' will be decreased before call 'percpu_counter_cpu_dead()', > then maybe return incorrect result. > To solve above issue, also need to add dying CPUs count when do quick judgment > in __percpu_counter_compare(). Not sure I completely understood the race you are describing. All CPU accounting is protected with percpu_counters_lock. Is it a real race that you've faced, or hypothetical? If it's real, can you share stack traces? > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx> > --- > lib/percpu_counter.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c > index 5004463c4f9f..399840cb0012 100644 > --- a/lib/percpu_counter.c > +++ b/lib/percpu_counter.c > @@ -227,6 +227,15 @@ static int percpu_counter_cpu_dead(unsigned int cpu) > return 0; > } > > +static __always_inline unsigned int num_count_cpus(void) This doesn't look like a good name. Maybe num_offline_cpus? > +{ > +#ifdef CONFIG_HOTPLUG_CPU > + return (num_online_cpus() + num_dying_cpus()); ^ ^ 'return' is not a function. Braces are not needed Generally speaking, a sequence of atomic operations is not an atomic operation, so the above doesn't look correct. I don't think that it would be possible to implement raceless accounting based on 2 separate counters. Most probably, you'd have to use the same approach as in 8b57b11cca88: lock(); for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) cnt++; unlock(); And if so, I'd suggest to implement cpumask_weight_or() for that. > +#else > + return num_online_cpus(); > +#endif > +} > + > /* > * Compare counter against given value. > * Return 1 if greater, 0 if equal and -1 if less > @@ -237,7 +246,7 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch) > > count = percpu_counter_read(fbc); > /* Check to see if rough count will be sufficient for comparison */ > - if (abs(count - rhs) > (batch * num_online_cpus())) { > + if (abs(count - rhs) > (batch * num_count_cpus())) { > if (count > rhs) > return 1; > else > -- > 2.31.1