On Fri, Mar 03, 2017 at 03:45:14PM -0600, Shiraz Saleem wrote: > This is not quite how our DB logic works. There are additional HW > steps and nuances in the flow. Unfortunately, to explain this, we > need to provide details of our internal HW flow for the DB logic. We > are unable to do so at this time. Well, it is very problematic to help you define what a cross-arch barrier should do if you can't explain what you need to have happen relative to PCI-E. > > I get the feeling this approach requires MFENCE to do something it > > doesn't... > > Mfence guarantees that load won't be reordered before the store, and > thus we are using it. If that is all then the driver can use LFENCE and the udma_from_device_barrier() .. Is that OK? But fundamentally, PCI is fairly lax about what it permits the root complex to do, and to what degree it requires strong ordering within the root complex itself for PCI issued LOAD/STORES. It is hard to understand how the order of CPU operations matters when the PCI operations to different cache lines can be re-ordered inside the root complex. An approach may work on some x86-64 systems but be unreliable on other arches, or even on unusual x86 (eg SGI's x86 NUMA systems have conformant, but lax, ordering rules) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html