On Thu, 5 Oct 2023, Dave Chinner wrote: > On Fri, Sep 29, 2023 at 08:42:45PM -0700, Hugh Dickins wrote: > > Percpu counter's compare and add are separate functions: without locking > > around them (which would defeat their purpose), it has been possible to > > overflow the intended limit. Imagine all the other CPUs fallocating > > tmpfs huge pages to the limit, in between this CPU's compare and its add. > > > > I have not seen reports of that happening; but tmpfs's recent addition > > of dquot_alloc_block_nodirty() in between the compare and the add makes > > it even more likely, and I'd be uncomfortable to leave it unfixed. > > > > Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it. > > > > I believe this implementation is correct, and slightly more efficient > > than the combination of compare and add (taking the lock once rather > > than twice when nearing full - the last 128MiB of a tmpfs volume on a > > machine with 128 CPUs and 4KiB pages); but it does beg for a better > > design - when nearing full, there is no new batching, but the costly > > percpu counter sum across CPUs still has to be done, while locked. > > > > Follow __percpu_counter_sum()'s example, including cpu_dying_mask as > > well as cpu_online_mask: but shouldn't __percpu_counter_compare() and > > __percpu_counter_limited_add() then be adding a num_dying_cpus() to > > num_online_cpus(), when they calculate the maximum which could be held > > across CPUs? But the times when it matters would be vanishingly rare. > > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > Cc: Tim Chen <tim.c.chen@xxxxxxxxx> > > Cc: Dave Chinner <dchinner@xxxxxxxxxx> > > Cc: Darrick J. Wong <djwong@xxxxxxxxxx> > > --- > > Tim, Dave, Darrick: I didn't want to waste your time on patches 1-7, > > which are just internal to shmem, and do not affect this patch (which > > applies to v6.6-rc and linux-next as is): but want to run this by you. > > Hmmmm. IIUC, this only works for addition that approaches the limit > from below? That's certainly how I was thinking about it, and what I need for tmpfs. Precisely what its limitations (haha) are, I'll have to take care to spell out. (IIRC - it's a while since I wrote it - it can be used for subtraction, but goes the very slow way when it could go the fast way - uncompared percpu_counter_sub() much better for that. You might be proposing that a tweak could adjust it to going the fast way when coming down from the "limit", but going the slow way as it approaches 0 - that would be neat, but I've not yet looked into whether it's feasily done.) > > So if we are approaching the limit from above (i.e. add of a > negative amount, limit is zero) then this code doesn't work the same > as the open-coded compare+add operation would? To it and to me, a limit of 0 means nothing positive can be added (and it immediately returns false for that case); and adding anything negative would be an error since the positive would not have been allowed. Would a negative limit have any use? It's definitely not allowing all the possibilities that you could arrange with a separate compare and add; whether it's ruling out some useful possibilities to which it can easily be generalized, I'm not sure. Well worth a look - but it'll be easier for me to break it than get it right, so I might just stick to adding some comments. I might find that actually I prefer your way round: getting slower as approaching 0, without any need for specifying a limit?? That the tmpfs case pushed it in this direction, when it's better reversed? Or that might be an embarrassing delusion which I'll regret having mentioned. > > Hence I think this looks like a "add if result is less than" > operation, which is distinct from then "add if result is greater > than" operation that we use this same pattern for in XFS and ext4. > Perhaps a better name is in order? The name still seems good to me, but a comment above it on its assumptions/limitations well worth adding. I didn't find a percpu_counter_compare() in ext4, and haven't got far yet with understanding the XFS ones: tomorrow... > > I'm also not a great fan of having two > similar-but-not-quite-the-same implementations for the two > comparisons, but unless we decide to convert the XFs slow path to > this it doesn't matter that much at the moment.... > > Implementation seems OK at a quick glance, though. Thanks! > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx