On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote: > Percpu counter's compare and add are separate functions: without locking > around them (which would defeat their purpose), it has been possible to > overflow the intended limit. Imagine all the other CPUs fallocating > tmpfs huge pages to the limit, in between this CPU's compare and its add. > > I have not seen reports of that happening; but tmpfs's recent addition > of dquot_alloc_block_nodirty() in between the compare and the add makes > it even more likely, and I'd be uncomfortable to leave it unfixed. > > Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. > I think the posted (and by now landed) code is racy. I had seen there was a follow up patch which further augmented the routine, but it did not alter the issue below so I'm responding to this thread. > +/* > + * Compare counter, and add amount if the total is within limit. > + * Return true if amount was added, false if it would exceed limit. > + */ > +bool __percpu_counter_limited_add(struct percpu_counter *fbc, > + s64 limit, s64 amount, s32 batch) > +{ > + s64 count; > + s64 unknown; > + unsigned long flags; > + bool good; > + > + if (amount > limit) > + return false; > + > + local_irq_save(flags); > + unknown = batch * num_online_cpus(); > + count = __this_cpu_read(*fbc->counters); > + > + /* Skip taking the lock when safe */ > + if (abs(count + amount) <= batch && > + fbc->count + unknown <= limit) { > + this_cpu_add(*fbc->counters, amount); > + local_irq_restore(flags); > + return true; > + } > + Note the value of fbc->count is *not* stabilized. > + raw_spin_lock(&fbc->lock); > + count = fbc->count + amount; > + > + /* Skip percpu_counter_sum() when safe */ > + if (count + unknown > limit) { > + s32 *pcount; > + int cpu; > + > + for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { > + pcount = per_cpu_ptr(fbc->counters, cpu); > + count += *pcount; > + } > + } > + > + good = count <= limit; > + if (good) { > + count = __this_cpu_read(*fbc->counters); > + fbc->count += count + amount; > + __this_cpu_sub(*fbc->counters, count); > + } > + > + raw_spin_unlock(&fbc->lock); > + local_irq_restore(flags); > + return good; > +} > + Consider 2 cpus executing the func at the same time, where one is updating fbc->counter in the slow path while the other is testing it in the fast path. cpu0 cpu1 if (abs(count + amount) <= batch && fbc->count + unknown <= limit) /* gets past the per-cpu traversal */ /* at this point cpu0 decided to bump fbc->count, while cpu1 decided to * bump the local counter */ this_cpu_add(*fbc->counters, amount); fbc->count += count + amount; Suppose fbc->count update puts it close enough to the limit that an addition from cpu1 would put the entire thing over said limit. Since the 2 operations are not synchronized cpu1 can observe fbc->count prior to the bump and update it's local counter, resulting in aforementioned overflow. Am I misreading something here? Is this prevented? To my grep the only user is quotas in shmem and I wonder if that use is even justified. I am aware of performance realities of atomics. However, it very well may be that whatever codepaths which exercise the counter are suffering multicore issues elsewhere, making an atomic (in a dedicated cacheline) a perfectly sane choice for the foreseeable future. Should this be true there would be 0 rush working out a fixed variant of the routine.