On Mon, Mar 05, 2018 at 01:42:12PM -0700, Keith Busch wrote: > On Mon, Mar 05, 2018 at 01:10:53PM -0700, Jason Gunthorpe wrote: > > So when reading the above mlx code, we see the first wmb() being used > > to ensure that CPU stores to cachable memory are visible to the DMA > > triggered by the doorbell ring. > > IIUC, we don't need a similar barrier for NVMe to ensure memory is > visibile to DMA since the SQE memory is allocated DMA coherent when the > SQ is not within a CMB. You still need it. DMA coherent just means you don't need to call the DMA API after writing, it says nothing about CPU ordering. eg on x86 DMA coherent is just normal system memory, and you do need the SFENCE betweeen system memory stores and DMA triggering MMIO, apparently. Jason