On Thu, Mar 16, 2023 at 07:36:18AM +0800, Hillf Danton wrote: > On 15 Mar 2023 19:49:36 +1100 Dave Chinner <dchinner@xxxxxxxxxx> > > @@ -141,11 +141,20 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc, > > > > /* > > * Add up all the per-cpu counts, return the result. This is a more accurate > > - * but much slower version of percpu_counter_read_positive() > > + * but much slower version of percpu_counter_read_positive(). > > + * > > + * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums > > + * from CPUs that are in the process of being taken offline. Dying cpus have > > + * been removed from the online mask, but may not have had the hotplug dead > > + * notifier called to fold the percpu count back into the global counter sum. > > + * By including dying CPUs in the iteration mask, we avoid this race condition > > + * so __percpu_counter_sum() just does the right thing when CPUs are being taken > > + * offline. > > */ > > s64 __percpu_counter_sum(struct percpu_counter *fbc) > > { > > - return __percpu_counter_sum_mask(fbc, cpu_online_mask); > > + > > + return __percpu_counter_sum_mask(fbc, cpu_dying_mask); > > } > > EXPORT_SYMBOL(__percpu_counter_sum); > > > > -- > > 2.39.2 > > Hm... the window of the race between a dying cpu and the sum of percpu counter > spotted in commit f689054aace2 is stil open after a text-book log message. > > cpu 0 cpu 2 > --- --- > percpu_counter_sum() percpu_counter_cpu_dead() > > raw_spin_lock_irqsave(&fbc->lock, flags); > ret = fbc->count; > for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { > s32 *pcount = per_cpu_ptr(fbc->counters, cpu); > ret += *pcount; > } > raw_spin_unlock_irqrestore(&fbc->lock, flags); > > raw_spin_lock(&fbc->lock); > pcount = per_cpu_ptr(fbc->counters, cpu); > fbc->count += *pcount; > *pcount = 0; > raw_spin_unlock(&fbc->lock); Their is no race condition updating fbc->count here - I explained this in the cover letter. i.e. the sum in percpu_counter_sum() is to a private counter and does not change fbc->count. Therefore we only need/want to fold the dying cpu percpu count into fbc->count in the CPU_DEAD callback. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx