On Mon, Jan 14, 2019 at 11:30:12AM -0800, Suren Baghdasaryan wrote: > For memory ordering (which Johannes also pointed out) the critical point is: > > times[cpu] += delta | if g->polling: > smp_wmb() | g->polling = polling = 0 > cmpxchg(g->polling, 0, 1) | smp_rmb() > | delta = times[*] (through goto SLOWPATH) > > So that hotpath writes to times[] then g->polling and slowpath reads > g->polling then times[]. cmpxchg() implies a full barrier, so we can > drop smp_wmb(). Something like this: > > times[cpu] += delta | if g->polling: > cmpxchg(g->polling, 0, 1) | g->polling = polling = 0 > | smp_rmb() > | delta = times[*] (through goto SLOWPATH) > > Would that address your concern about ordering? cmpxchg() implies smp_mb() before and after, so the smp_wmb() on the left column is superfluous. The right hand column is actively wrong; because that reads like it wants to order a store (g->polling = 0) and a load (d = times[]), and therefore requires smp_mb(). Also, you probably want to use atomic_t for g->polling, because we (sadly) have architectures where regular stores and atomic ops don't work 'right'.