> Hmm, this is actually consistent with the example below [1]. > > AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good > for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers > (the descriptors, went through the dma_map_page() API), but not for the > doorbell (a coherent memory, typically allocated via dma_alloc_coherent) > that requires using the stronger wmb() barrier. If x86 truely requires a wmb() (aka SFENCE) here then the userspace RDMA stuff is broken too, and that has been tested to death at this point.. I looked into this at one point and I thought I concluded that x86 did not require a SFENCE between a posted PCI write and writes to system memory to guarnetee order with-respect-to the PCI device? Well, so long as non-temporal stores and other specialty accesses are not being used.. Is there a chance a fancy sse optimized memcpy or memset, crypto or something is being involved here? However, Documentation/memory-barriers.txt does seem pretty clear that the kernel definition of wmb() makes it required here, even if it might be overkill for x86? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html