On Thu, Aug 27, 2020 at 03:28:48PM +0100, Al Viro wrote: > On Thu, Aug 27, 2020 at 06:19:44PM +0800, Shaokun Zhang wrote: > > From: Yuqi Jin <jinyuqi@xxxxxxxxxx> > > > > It is well known that the performance of atomic_add is better than that of > > atomic_cmpxchg. > > The initial value of @f_count is 1. While @f_count is increased by 1 in > > __fget_files, it will go through three phases: > 0, < 0, and = 0. When the > > fixed value 0 is used as the condition for terminating the increase of 1, > > only atomic_cmpxchg can be used. When we use < 0 as the condition for > > stopping plus 1, we can use atomic_add to obtain better performance. > > Suppose another thread has just removed it from the descriptor table. > > > +static inline bool get_file_unless_negative(atomic_long_t *v, long a) > > +{ > > + long c = atomic_long_read(v); > > + > > + if (c <= 0) > > + return 0; > > Still 1. Now the other thread has gotten to dropping the last reference, > decremented counter to zero and committed to freeing the struct file. > > > + > > + return atomic_long_add_return(a, v) - 1; > > ... and you increment that sucker back to 1. Sure, you return 0, so the > caller does nothing to that struct file. Which includes undoing the > changes to its refecount. > > In the meanwhile, the third thread does fget on the same descriptor, > and there we end up bumping the refcount to 2 and succeeding. Which > leaves the caller with reference to already doomed struct file... > > IOW, NAK - this is completely broken. The whole point of > atomic_long_add_unless() is that the check and conditional increment > are atomic. Together. That's what your optimization takes out. Cheers Al, yes, this is fscked. As an aside, I've previously toyed with the idea of implementing a form of cmpxchg() using a pair of xchg() operations and a smp_cond_load_relaxed(), where the thing would transition through a "reserved value", which might perform better with the current trend of building hardware that doesn't handle CAS failure so well. But I've never had the time/motivation to hack it up, and it relies on that reserved value which obviously doesn't always work (so it would have to be a separate API). Will