Re: [PATCH RFC rdma-core] Verbs: Introduce import verbs for device, PD, MR

Yishai Hadas <yishaih@xxxxxxxxxxxxxxxxxx> · Mon, 11 May 2020 18:35:35 +0300

On 5/11/2020 5:31 PM, Gal Pressman wrote:
On 11/05/2020 16:12, Yishai Hadas wrote:
Introduce import verbs for device, PD, MR, it enables processes to share
their ibv_contxet and then share PD and MR that is associated with.

A process is creating a device and then uses some of the Linux systems
calls to dup its 'cmd_fd' member which lets other process to obtain
owning on.

Once other process obtains the 'cmd_fd' it can call ibv_import_device()
which returns an ibv_contxet on the original RDMA device.

On the imported device there is an option to import PD(s) and MR(s) to
achieve a sharing on those objects.

This is the responsibility of the application to coordinate between all
ibv_context(s) that use the imported objects, such that once destroy is
done no other process can touch the object except for unimport. All
users of the context must collaborate to ensure this.

A matching unimport verbs where introduced for PD and MR, for the device
the ibv_close_device() API should be used.

Detailed man pages are introduced as part of this RFC patch to clarify
the expected usage and notes.

Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxxxx>

Hi Yishai,

A few questions:
Can you please explain the use case? I remember there was a discussion on the
previous shared PD kernel submission (by Yuval and Shamir) but I'm not sure if
there was a conclusion.

The expected flow and use case are as follows.

One process creates an ibv_context by calling ibv_open_device() and then 
enables owning of its 'cmd_fd' with other processes by some Linux system 
call, (see man page as part of this RFC for some alternatives). Then 
other process that owns this 'cmd_fd' will be able to have its own 
ibv_context for the same RDMA device by calling ibv_import_device().

At that point those processes really work on same kernel context and 
PD(s), MR(s) and potentially other objects in the future can be shared 
by calling ibv_import_pd()/mr() assuming that the initiator process 
let's the other ones know the kernel handle value.

Once a PD and MR which points to this PD were shared it enables a memory 
that was registered by one process to be used by others with the 
matching lkey/rkey for RDMA operations.

Could you please elaborate more how the process cleanup flow (e.g killed
process) is going to change? I know it's a very broad question but I'm just
trying to get the general idea.

For now the model in those suggested APIs is that cleanup will be done 
or explicitly by calling the relevant destroy command or alternatively 
once all processes that own the cmd_fd will be closed.

From kernel side there is only one object and its ref count is not 
increased as part of the import_xxx() functions, see in the man pages 
some notes regarding this point.

What's expected to happen in a case where we have two processes P1 & P2, both
use a shared PD, but separate MRs and QPs (created under the same shared PD).
Now when an RDMA read request arrives at P2's QP, but refers to an MR of P1
(which was not imported, but under the same PD), how would you expect the device
to handle that?

The processes are behaving almost like 2 threads each have a QP and an 
MR, if you mix them around it will work just like any buggy software.
In this case I would expect the device to scatter to the MR that was 
pointed by the RDMA read request, any reason that it will behave 
differently ?

Yishai