On Thu, Oct 13, 2022 at 05:41:06PM +0200, Ard Biesheuvel wrote: > On Thu, 13 Oct 2022 at 15:37, Borislav Petkov <bp@xxxxxxxxx> wrote: > > > > On Wed, Oct 12, 2022 at 12:04:57PM +0000, Justin He wrote: > > > I have a concern about what if cmpxchg failed? Do we have to still > > > guarantee the ordering since cmpxchg will not imply a smp_mb if it > > > failed. > > > > Of course it will imply that. At least on x86 it does. smp_wmb() is a > > compiler barrier there and cmpxchg() already has that barrier semantics > > by clobbering "memory". I'm pretty sure you should have the same thing > > on ARM. > > > > No it definitely does not imply that. A memory clobber is a codegen > construct, and the hardware could still complete the writes in a way > that could result in another observer seeing a mix of old and new > values that is inconsistent with the ordering of the stores as issued > by the compiler. Borislav is thinking too much x86. Failed cmpxchg() does indeed not imply any memory ordering for all architectures -- and while the memory clobber (aka. barrier()) is equivalent to an smp_wmb() on x86, that most certainly doesn't hold for non x86 code. > > says, "new_cache must be put into array after its contents are written". > > > > Are we writing anything into the cache if cmpxchg fails? > > > > The cache fields get updated but the pointer to the struct is never > shared globally if the cmpxchg() fails so not having the barrier on > failure should not be an issue here. That is how I read the code too; so if the cmpxchg fails the object is not published and nobody cares about the ordering. > > > > Besides, I didn't find the paired smp_mb or smp_rmb for this smp_wmb. > > > > Why would there be pairs? I don't understand that statement here. > > > > Typically, the other observer pairs the write barrier with a read barrier. > > In this case, the other observer appears to be ghes_estatus_cached(), > and the reads of the cache struct fields must be ordered after the > read of the cache struct's address. However, there is an implicit > ordering there through an address dependency (you cannot dereference a > struct without knowing its address) so the ordering is implied (and > rcu_dereference() has a READ_ONCE() inside so we are guaranteed to > always dereference the same struct, even if the array slot gets > updated concurrently.) > > If you want to get rid of the barrier, you could drop it and change > the cmpxchg() to cmpxchg_release(). cmpxchg_release() is strictly weaker than cmpxchg(); so I don't see the point there other than optimizing for weak architectures. It can't fundamentally fix anything.