On 24/11/16 09:42 AM, Jason Gunthorpe wrote: > There are three cases to worry about: > - Coherent long lived page table mirroring (RDMA ODP MR) > - Non-coherent long lived page table mirroring (RDMA MR) > - Short lived DMA mapping (everything else) > > Like you say below we have to handle short lived in the usual way, and > that covers basically every device except IB MRs, including the > command queue on a NVMe drive. Yes, this makes sense to me. Though I thought regular IB MRs with regular memory currently pinned the pages (despite being long lived) that's why we can run up against the "max locked memory" limit. It doesn't seem so terrible if GPU memory had a similar restriction until ODP like solutions get implemented. >> Yeah, we've had RDMA and O_DIRECT transfers to PCIe backed ZONE_DEVICE >> memory working for some time. I'd say it's a good fit. The main question >> we've had is how to expose PCIe bars to userspace to be used as MRs and >> such. > Is there any progress on that? Well, I guess there's some consensus building to do. The existing options are: * Device DAX: which could work but the problem I see with it is that it only allows one application to do these transfers. Or there would have to be some user-space coordination to figure which application gets what memeroy. * Regular DAX in the FS doesn't work at this time because the FS can move the file you think your transfer to out from under you. Though I understand there's been some work with XFS to solve that issue. Though, we've been considering that the backed memory would be non-volatile which adds some of this complexity. If the memory were volatile the kernel would just need to do some relatively straight forward allocation to user-space when asked. For example, with NVMe, the kernel could give chunks of the CMB buffer to userspace via an mmap call to /dev/nvmeX. Though I think there's been some push back against things like that as well. > I still don't quite get what iopmem was about.. I thought the > objection to uncachable ZONE_DEVICE & DAX made sense, so running DAX > over iopmem and still ending up with uncacheable mmaps still seems > like a non-starter to me... The latest incarnation of iopmem simply created a block device backed by ZONE_DEVICE memory on a PCIe BAR. We then put a DAX FS on it and user-space could mmap the files and send them to other devices to do P2P transfers. I don't think there was a hard objection to uncachable ZONE_DEVICE and DAX. We did try our experimental hardware with cached ZONE_DEVICE and it did work but the performance was beyond unusable (which may be a hardware issue). In the end I feel the driver would have to decide the most appropriate caching for the hardware and I don't understand why WC or UC wouldn't work with ZONE_DEVICE. Logan -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html