On Mon, Oct 09, 2017 at 12:28:29PM -0700, Dan Williams wrote: > > I don't think this has ever come up in the context of an all-device MR > > invalidate requirement. Drivers already have code to invalidate > > specifc MRs, but to find all MRs that touch certain pages and then > > invalidate them would be new code. > > > > We also have ODP aware drivers that can retarget a MR to new > > physical pages. If the block map changes DAX should synchronously > > retarget the ODP MR, not halt DMA. > > Have a look at the patch [1], I don't touch the ODP path. But, does ODP work OK already? I'm not clear on that.. > > Most likely ODP & DAX would need to be used together to get robust > > user applications, as having the user QP's go to an error state at > > random times (due to DMA failures) during operation is never going to > > be acceptable... > > It's not random. The process that set up the mapping and registered > the memory gets SIGIO when someone else tries to modify the file map. > That process then gets /proc/sys/fs/lease-break-time seconds to fix > the problem before the kernel force revokes the DMA access. Well, the process can't fix the problem in bounded time, so it is random if it will fail or not. MR life time is under the control of the remote side, and time to complete the network exchanges required to release the MRs is hard to bound. So even if I implement SIGIO properly my app will still likely have random QP failures under various cases and work loads. :( This is why ODP should be the focus because this cannot work fully reliably otherwise.. > > Perhaps you might want to initially only support ODP MR mappings with > > DAX and then the DMA fencing issue goes away? > > I'd rather try to fix the non-ODP DAX case instead of just turning it off. Well, what about using SIGKILL if the lease-break-time hits? The kernel will clean up the MRs when the process exits and this will fence DMA to that memory. But, still, if you really want to be fined graned, then I think invalidating the impacted MR's is a better solution for RDMA than trying to do it with the IOMMU... Jason