Re: RDMA subsystem namespace related questions (was Re: Finding the namespace of a struct ib_device)

Ka-Cheong Poon <ka-cheong.poon@xxxxxxxxxx> · Mon, 5 Oct 2020 18:27:39 +0800

On 10/2/20 10:04 PM, Jason Gunthorpe wrote:
On Wed, Sep 30, 2020 at 06:32:28PM +0800, Ka-Cheong Poon wrote:
After the aforementioned check on a namespace, what can the client
do?  It still needs to use the existing ib_register_client() to
register with RDMA subsystem.  And after registration, it will get
notifications for all add/remove upcalls on devices not related
to the namespace it is interested in.  The client can work around
this if there is a supported way to find out the namespace of a
device, hence the original proposal of having rdma_dev_to_netns().

Yes, the client would have to check the netns and abort client
registration.

Arguably many of our current clients are wrong in this area since they
only work on init_net anyhow.

It would make sense to introduce a rdma_dev_to_netns() and use it to
block clients on ULPs that use the CM outside init_net.

Will send a simple patch for this.

that namespace to use it.  If there are a large number of namespaces,
there won't be enough devices to assign to all of them (e.g. the
hardware I have access to only supports up to 24 VFs).  The shared
mode can be used in this case.  Could you please explain what needs
to be done to support a large number of namespaces in exclusive
mode?

Modern HW supports many more than 24 VFs, this is the expected
interface

Do you have a ballpark on how many VFs are supported?  Is it in
the range of many thousands?

BTW, while the shared mode is still here, can there be a simple
way for a client to find out which mode the RDMA subsystem is using?

BTW, if exclusive mode is the future, it may make sense to have
something like rdma_[un]register_net_client().

I don't think we need this

A new connection comes in and the event handler is called for an
RDMA_CM_EVENT_CONNECT_REQUEST event.  There is no obvious namespace info regarding
the event.  It seems that the only way to find out the namespace info is to
use the context of struct rdma_cm_id.

The rdma_cm_id has only a single namespace, the ULP knows what it is
because it created it. A listening ID can't spawn new IDs in different
namespaces.

The problem is that the handler is not given the listener's
rdma_cm_id when it is called.  It is only given the new rdma_cm_id.

The new cm_id starts with the same ->context as the listener, the ULP should
use this to pass any data, such as the namespace.

This is what I suspected as mentioned in the previous email.  But
this makes it inconvenient if the context is already used for
something else.

It seems like a ULP error to drive cm_id lifetime entirely from the
per-net stuff.

It is not an ULP error.  While there are many reasons to delete
a listener, it is not necessary for the listener to die unless the
namespace is going away.

It certainly currently is.

I'm skeptical ULPs should be doing per-ns stuff like that. A ns aware
ULP should fundamentally be linked to some FD and the ns to use should
derived from the process that FD is linked to. Keeping per-ns stuff
seems wrong.

It is a kernel module.  Which FD are you referring to?  It is
unclear why a kernel module must associate itself with a user
space FD.  Is there a particular reason that rdma_create_id()
needs to behave differently than sock_create_kern() in this
regard?

While discussing about per namespace stuff, what is the reason
that the cma_wq is a global shared by all namespaces instead of
per namespace?  Is there a problem to have a per namespace cma_wq?

--
K. Poon
ka-cheong.poon@xxxxxxxxxx