Hi Al, 在 2020/8/27 22:28, Al Viro 写道: > On Thu, Aug 27, 2020 at 06:19:44PM +0800, Shaokun Zhang wrote: >> From: Yuqi Jin <jinyuqi@xxxxxxxxxx> >> >> It is well known that the performance of atomic_add is better than that of >> atomic_cmpxchg. >> The initial value of @f_count is 1. While @f_count is increased by 1 in >> __fget_files, it will go through three phases: > 0, < 0, and = 0. When the >> fixed value 0 is used as the condition for terminating the increase of 1, >> only atomic_cmpxchg can be used. When we use < 0 as the condition for >> stopping plus 1, we can use atomic_add to obtain better performance. > > Suppose another thread has just removed it from the descriptor table. > >> +static inline bool get_file_unless_negative(atomic_long_t *v, long a) >> +{ >> + long c = atomic_long_read(v); >> + >> + if (c <= 0) >> + return 0; > > Still 1. Now the other thread has gotten to dropping the last reference, > decremented counter to zero and committed to freeing the struct file. > Apologies that I missed it. >> + >> + return atomic_long_add_return(a, v) - 1; > > ... and you increment that sucker back to 1. Sure, you return 0, so the > caller does nothing to that struct file. Which includes undoing the > changes to its refecount. > > In the meanwhile, the third thread does fget on the same descriptor, > and there we end up bumping the refcount to 2 and succeeding. Which > leaves the caller with reference to already doomed struct file... > > IOW, NAK - this is completely broken. The whole point of > atomic_long_add_unless() is that the check and conditional increment > are atomic. Together. That's what your optimization takes out. > How about this? We try to replace atomic_cmpxchg with atomic_add to improve performance. The atomic_add does not check the current f_count value. Therefore, the number of online CPUs is reserved to prevent multi-core competition. + +static inline bool get_file_unless(atomic_long_t *v, long a) +{ + long cpus = num_online_cpus(); + long c = atomic_long_read(v); + long ret; + + if (c > cpus || c < -cpus) + ret = atomic_long_add_return(a, v) - a; + else + ret = atomic_long_add_unless(v, a, 0); + + return ret; +} + #define get_file_rcu_many(x, cnt) \ - atomic_long_add_unless(&(x)->f_count, (cnt), 0) + get_file_unless(&(x)->f_count, (cnt)) Thanks, Shaokun > . >