> I found some atomic_add/dec are replaced with atomic_add/dec_return, I am going to replace -return variants with -fetch variants, potentially -fetch > those helpers with return value imply a full memory barrier around it, but > others without return value do not. Do you have any numbers to show > the impact? Maybe atomic_add/dec_return_relaxed can help this. The generic variant uses arch_cmpxchg() for all atomic variants without any extra barriers. Therefore, on platforms that use generic implementations there won't be performance differences except for an extra branch that checks results when VM_BUG_ON is enabled. On x86 the difference between the two is the following atomic_add: lock add %eax,(%rsi) atomic_fetch_add: lock xadd %eax,(%rsi) atomic_fetch_add_relaxed: lock xadd %eax,(%rsi) No differences between relaxed and non relaxed variants. However, we used lock xadd instead of lock add. I am not sure if the performance difference is going to be different. Pasha