Re: ibv_reg_mr and mmap'ed memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/05/2018 09:59 PM, Jason Gunthorpe wrote:
On Thu, Nov 01, 2018 at 10:05:04AM +0100, Jörn Schumacher wrote:
On 10/31/2018 05:20 PM, Jason Gunthorpe wrote:
On Wed, Oct 31, 2018 at 03:59:59PM +0100, Jörn Schumacher wrote:


On 10/26/2018 05:52 PM, Jason Gunthorpe wrote:
On Fri, Oct 26, 2018 at 04:55:29PM +0200, Jörn Schumacher wrote:
Hi all,

I am trying to register an mmap'ed memory region as send buffer using
ibv_reg_mr (via a libfabric call).

The address is a virtual address that has been mmap'ed from a kernel address
in a custom kernel driver. The mapping uses remap_pfn_range:


      vma->vm_flags |= VM_DONTEXPAND;
      vma->vm_flags |= VM_DONTDUMP;
      vma->vm_flags |= VM_LOCKED;
      remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, size,
vma->vm_page_prot)


The ibv_reg_mr call fails with -22 (Invalid argument). The call succeeds if
I replace the buffer with a simple malloc'ed buffer.

Why would this not work with mmap'ed memory? Is there a way of mmap'ing the
kernel address to user space that would allow the memory registration?

mmap'd memory has to be comaptible with get_user_pages and
remap_pfn_range does not produce memory like that.


Thanks Jason. CC'ing Markus who wrote the driver I am talking about.

This driver maintains a large, contiguous chunk of non-fragmented memory. We
know already the physical address of this buffer and it is pinned, so the
call to get_user_pages seems redundant.

Is there a way to go around ibv_reg_mr and still use addresses from this
memory chunk in ibv_post_send/ibv_post_recv calls?

AFAIK this other driver would have to somehow provide a system call to
create the IB MR via the kernel API that can use physical addresses.

Better would be to arrange so the other driver creates a user VMA that
is backed by struct pages so get_user_pages works right

Do you know an alternative to remap_pfn_range that would do the memory
mapping *and* be compatible with get_user_pages?

No...


Eventually we found a solution that works for our use case. I would like to share it here in case somebody stumbles over this thread with a similar problem.

To summarize the problem once more: We have a driver that manages large buffers that are used by a PCIe device for DMA writes. We would like to use these buffers in RDMA calls, but the ibv_reg_mr call fails because the mmap'ed memory address is incompatible with the RDMA driver stack.

The driver mentioned above was written by Markus and is not published anywhere right now, but the code could be shared (without guarantee of support) if it is of interest to anybody.

In fact there are two approaches that work.

Approach 1:
There is a verbs extensions that allows the registration of physical addresses. This verb is not available in the mainline kernel, but for example the Mellanox OFED driver supports it. The concept is written up in [1], but in a nutshell it involves calling ibv_exp_reg_mr with the IBV_EXP_ACCESS_PHYSICAL_ADDR flag. The call is not actually associated with any memory address, but rather registers the full physical address space.

The *physical* address can then be used in verb calls. Our driver exposes the physical address of managed memory to userspace, so this approach works fine.

To get this to play together with libfabric we had to patch it slightly [2]. However, this is unlikely to land in mainline libfabric.


Approach 2:
The other idea is to mmap the device driver's memory into user space such that it is compatible with the RDMA drivers. Our original driver uses remap_pfn_range, which works fine, but the resulting memory is not compatible with the get_user_pages call that is used in the Linux RDMA drivers. An alternative to remap_pfn_range is to provide an implementation of the nopage method to the mapping VMA. This is described in detail in the book of Rubini [3].

The mmap using the "nopage"-approach produces a mapping that is compatible with get_user_pages. Hence, the virtual address of such a mapping can be directly used in any libibverbs or libfabric calls.


We opted for the 2nd approach.


Cheers,
  Markus & Jörn


[1] https://community.mellanox.com/docs/DOC-2480
[2] https://github.com/joerns/libfabric/compare/v1.6.x...joerns:phys_addr_mr
[3] https://lwn.net/Kernel/LDD3/, Chapter 15 "Memory Mapping and DMA"



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux