Re: RDMA subsystem namespace related questions (was Re: Finding the namespace of a struct ib_device)

Ka-Cheong Poon <ka-cheong.poon@xxxxxxxxxx> · Thu, 8 Oct 2020 19:08:42 +0800

On 10/8/20 6:36 PM, Leon Romanovsky wrote:
On Thu, Oct 08, 2020 at 06:22:03PM +0800, Ka-Cheong Poon wrote:
On 10/7/20 7:16 PM, Leon Romanovsky wrote:
On Wed, Oct 07, 2020 at 04:38:45PM +0800, Ka-Cheong Poon wrote:
On 10/6/20 8:46 PM, Jason Gunthorpe wrote:
On Tue, Oct 06, 2020 at 05:36:32PM +0800, Ka-Cheong Poon wrote:

Kernel modules should not be doing networking unless commanded to by
userspace.

It is still not clear why this is an issue with RDMA
connection, but not with general kernel socket.  It is
not random networking.  There is a purpose.

It is a problem with sockets too, how do the socket users trigger
their socket usages? AFAIK all cases originate with userspace

A user starts a namespace.  The module is loaded for servicing
requests.  The module starts a listener.  The user deletes
the namespace.  This scenario will have everything cleaned up
properly if the listener is a kernel socket.  This is not the
case with RDMA.

Please point to reputable code in upstream doing this

It is not clear what "reputable" here really means.  If it just
means something in kernel, then nearly all, if not all, Internet
protocols code in kernel create a control kernel socket for every
network namespaces.  That socket is deleted in the per namespace
exit function.  If it explicitly means listening socket, AFS and
TIPC in kernel do that for every namespaces.  That socket is
deleted in the per namespace exit function.

It is very common for a network protocol to have something like
this for protocol processing.  It is not clear why RDMA subsystem
behaves differently and forbids this common practice.  Could you
please elaborate the issues this practice has such that the RDMA
subsystem cannot support it?

Just curious, are we talking about theoretical thing here or do you
have concrete and upstream ULP code to present?

As I mentioned in a previous email, I have running code.
Otherwise, why would I go to such great length to find
out what is missing in the RDMA subsystem in supporting
kernel namespace usage.

So why don't you post this running code?

Will it change the listening RDMA endpoint started by the module from
"rogue" to normal?  This is the fundamental problem.  This is the reason
I ask why the RDMA subsystem behaves like this in the first place.  If
the reason is just that there is no existing user, it is fine.  Unexpectedly,
the reason turns out to be that no kernel module is allowed to create its own
RDMA endpoint without a corresponding user space file descriptor and/or some
form of user space interaction.  This is a very serious restriction on how
the RDMA subsystem can be used by any kernel module.  This has to be sorted
out first.

Note that namespace does not really play a role in this "rogue" reasoning.
The init_net is also a namespace.  The "rogue" reasoning means that no
kernel module should start a listening RDMA endpoint by itself with or
without any extra namespaces.  In fact, to conform to this reasoning, the
"right" thing to do would be to change the code already in upstream to get
rid of the listening RDMA endpoint in init_net!

--
K. Poon
ka-cheong.poon@xxxxxxxxxx