On Thu, Mar 02, 2023 at 03:53:12PM -0500, Peter Xu wrote: > On Thu, Feb 09, 2023 at 12:01:52PM -0300, Marcelo Tosatti wrote: > > Goal is to have vmstat_shepherd to transfer from > > per-CPU counters to global counters remotely. For this, > > an atomic this_cpu_cmpxchg is necessary. > > > > Following the kernel convention for cmpxchg/cmpxchg_local, > > change ARM's this_cpu_cmpxchg_ helpers to be atomic, > > and add this_cpu_cmpxchg_local_ helpers which are not atomic. > > I can follow on the necessity of having the _local version, however two > questions below. > > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > > > Index: linux-vmstat-remote/arch/arm64/include/asm/percpu.h > > =================================================================== > > --- linux-vmstat-remote.orig/arch/arm64/include/asm/percpu.h > > +++ linux-vmstat-remote/arch/arm64/include/asm/percpu.h > > @@ -232,13 +232,23 @@ PERCPU_RET_OP(add, add, ldadd) > > _pcp_protect_return(xchg_relaxed, pcp, val) > > > > #define this_cpu_cmpxchg_1(pcp, o, n) \ > > - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > + _pcp_protect_return(cmpxchg, pcp, o, n) > > #define this_cpu_cmpxchg_2(pcp, o, n) \ > > - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > + _pcp_protect_return(cmpxchg, pcp, o, n) > > #define this_cpu_cmpxchg_4(pcp, o, n) \ > > - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > + _pcp_protect_return(cmpxchg, pcp, o, n) > > #define this_cpu_cmpxchg_8(pcp, o, n) \ > > + _pcp_protect_return(cmpxchg, pcp, o, n) > > This makes this_cpu_cmpxchg_*() not only non-local, but also (especially > for arm64) memory barrier implications since cmpxchg() has a strong memory > barrier, while the old this_cpu_cmpxchg*() doesn't have, afaiu. > > Maybe it's not a big deal if the audience of this helper is still limited > (e.g. we can add memory barriers if we don't want strict ordering > implication), but just to check with you on whether it's intended, and if > so whether it may worth some comments. It happens that on ARM-64 cmpxchg_local == cmpxchg_relaxed. See cf10b79a7d88edc689479af989b3a88e9adf07ff. This patchset maintains the current behaviour of this_cpu_cmpxch (for this_cpu_cmpxch_local), which was: #define this_cpu_cmpxchg_1(pcp, o, n) \ - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + _pcp_protect_return(cmpxchg, pcp, o, n) #define this_cpu_cmpxchg_2(pcp, o, n) \ - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + _pcp_protect_return(cmpxchg, pcp, o, n) #define this_cpu_cmpxchg_4(pcp, o, n) \ - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + _pcp_protect_return(cmpxchg, pcp, o, n) #define this_cpu_cmpxchg_8(pcp, o, n) \ + _pcp_protect_return(cmpxchg, pcp, o, n) > > + > > +#define this_cpu_cmpxchg_local_1(pcp, o, n) \ > > _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > +#define this_cpu_cmpxchg_local_2(pcp, o, n) \ > > + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > +#define this_cpu_cmpxchg_local_4(pcp, o, n) \ > > + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > +#define this_cpu_cmpxchg_local_8(pcp, o, n) \ > > + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > > I think cmpxchg_relaxed()==cmpxchg_local() here for aarch64, however should > we still use cmpxchg_local() to pair with this_cpu_cmpxchg_local_*()? Since cmpxchg_local = cmpxchg_relaxed, seems like this is not necessary. > Nothing about your patch along since it was the same before, but I'm > wondering whether this is a good time to switchover. I would say that another patch is more appropriate to change this, if desired. > The other thing is would it be good to copy arch-list for each arch patch? > Maybe it'll help to extend the audience too. Yes, should have done that (or CC each individual maintainer). Will do on next version. Thanks.