On Thu, Aug 19, 2021 at 10:01:50PM +0000, Nikolova, Tatyana E wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Wednesday, August 18, 2021 11:50 AM > > To: Nikolova, Tatyana E <tatyana.e.nikolova@xxxxxxxxx> > > Cc: dledford@xxxxxxxxxx; leon@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx > > Subject: Re: [PATCH rdma-core] irdma: Restore full memory barrier for > > doorbell optimization > > > > On Fri, Aug 13, 2021 at 05:25:49PM -0500, Tatyana Nikolova wrote: > > > >> 1. Software writing the valid bit in the WQE. > > > >> 2. Software reading shadow memory (hw_tail) value. > > > > > > > You are missing an ordered atomic on this read it looks like > > > > > > Hi Jason, > > > > > > Why do you think we need atomic ops in this case? We aren't trying to > > > protect from multiple threads but CPU re-ordering of a write and a > > > read. > > > > Which is what the atomics will do. > > > > Barriers are only appropriate when you can't add atomic markers to the > > actual data that needs ordering. > > Hi Jason, > > We aren't sure what you mean by atomic markers. We ran a few > experiments with atomics, but none of the barriers we tried > smp_mb__{before,after}_atomic(), smp_load_acquire() and > smp_store_release() translates to a full memory barrier on X86. Huh? Those are kernel primitives, this is a userspace patch. Userspace follows the C11 atomics memory model. So I'd expect atomic_store_explicit(tail, memory_order_release) atomic_load_explicit(tail, memory_order_acquire) To be the atomics you need. This will ensure that the read/writes to valid before the atomics are sequenced correctly, eg no CPU thread can observe an updated tail without also observing the set valid. Jason