On Mon, Aug 31, 2020 at 09:43:31AM +0800, Shaokun Zhang wrote: > How about this? We try to replace atomic_cmpxchg with atomic_add to improve > performance. The atomic_add does not check the current f_count value. > Therefore, the number of online CPUs is reserved to prevent multi-core > competition. No. Really, really - no. Not unless you can guarantee that process on another CPU won't lose its timeslice, ending up with more than one increment happening on the same CPU - done by different processes scheduled there, one after another. If you have some change of atomic_long_add_unless(), do it there. And get it past the arm64 folks. get_file_rcu() is nothing special in that respect *AND* it has to cope with any architecture out there. BTW, keep in mind that there's such thing as a KVM - race windows are much wider there, since a thread representing a guest CPU might lose its timeslice whenever the host feels like that. At which point you get a single instruction on a guest CPU taking longer than many thousands of instructions on another CPU of the same guest. AFAIK, arm64 does support KVM with SMP guests.