On Mon, Jun 29, 2020 at 10:31:40AM -0700, Jianxin Xiong wrote: > ZONE_DEVICE is a new zone for device memory in the memory management > subsystem. It allows pages from device memory being described with > specialized page structures. As the result, calls like get_user_pages() > can succeed, but what can be done with these page structures may be get_user_pages() does not succeed with ZONE_DEVICE_PAGEs > Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier > and ZONE_DEVICE to support shared virtual address space and page > migration between system memory and device memory. HMM doesn't support > pinning device memory because pages located on device must be able to > migrate to system memory when accessed by CPU. Peer-to-peer access > is possible if the peer can handle page fault. For RDMA, that means > the NIC must support on-demand paging. peer-peer access is currently not possible with hmm_range_fault(). > This patch series adds dma-buf importer role to the RDMA driver in > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is > chosen for a few reasons: first, the API is relatively simple and allows > a lot of flexibility in implementing the buffer manipulation ops. > Second, it doesn't require page structure. Third, dma-buf is already > supported in many GPU drivers. However, we are aware that existing GPU > drivers don't allow pinning device memory via the dma-buf interface. So.. this patch doesn't really do anything new? We could just make a MR against the DMA buf mmap and get to the same place? > Pinning and mapping a dma-buf would cause the backing storage to migrate > to system RAM. This is due to the lack of knowledge about whether the > importer can perform peer-to-peer access and the lack of resource limit > control measure for GPU. For the first part, the latest dma-buf driver > has a peer-to-peer flag for the importer, but the flag is currently tied > to dynamic mapping support, which requires on-demand paging support from > the NIC to work. ODP for DMA buf? > There are a few possible ways to address these issues, such as > decoupling peer-to-peer flag from dynamic mapping, allowing more > leeway for individual drivers to make the pinning decision and > adding GPU resource limit control via cgroup. We would like to get > comments on this patch series with the assumption that device memory > pinning via dma-buf is supported by some GPU drivers, and at the > same time welcome open discussions on how to address the > aforementioned issues as well as GPU-NIC peer-to-peer access > solutions in general. These seem like DMA buf problems, not RDMA problems, why are you asking these questions with a RDMA patch set? The usual DMA buf people are not even Cc'd here. > This is the second version of the patch series. Here are the changes > from the previous version: > * Instead of adding new device method for dma-buf specific registration, > existing method is extended to accept an extra parameter. I think the comment was the extra parameter should have been a umem or maybe a new umem_description struct, not blindly adding a fd as a parameter and a wack of EOPNOTSUPPS > This series is organized as follows. The first patch adds the common > code for importing dma-buf from a file descriptor and pinning and > mapping the dma-buf pages. Patch 2 extends the reg_user_mr() method > of the ib_device structure to accept dma-buf file descriptor as an extra > parameter. Vendor drivers are updated with the change. Patch 3 adds a > new uverbs command for registering dma-buf based memory region. The ioctl stuff seems OK, but this doesn't seem to bring any new functionality? Jason