On Wed, 2019-02-20 at 18:03 +0000, Parav Pandit wrote: > > -----Original Message----- > > From: Jason Gunthorpe > > Sent: Wednesday, February 20, 2019 11:56 AM > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > > <leon@xxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA > > mailing list <linux-rdma@xxxxxxxxxxxxxxx> > > Subject: Re: [PATCH rdma-next 5/5] RDMA/core: Add command to set > > ib_core device net namspace sharing mode > > > > On Wed, Feb 20, 2019 at 10:52:16AM -0700, Parav Pandit wrote: > > > > > Yes. we have the module parameter option in this series. > > > I came across a user who didn't have LOM nics. > > > They are directly using rdma nics in their cluster as primary and only > > > interface. > > > > This is very common for IB clusters, a dedicated ethernet management > > network is a very expensive component at large scale. > > > > > I do not know if such IB based networks exist. And if they do, when > > > they change this mode, they will have connectivity loss. > > > > Or they have to change modes before setting up ipoib. It is much less useful. > > > > > So we probably shouldn't be doing client unregister-register sequence > > > as part of this sys operation done by advance user. > > > > Provide a 'rdma ulp-restart' netlink command that does the enable/disable > > sequence? > > > Probably we should define more generic rdma dev up/down (start/stop) API that network manager sw can consume in sw. I was thinking more along the lines of trying to change the compat dev structure. Right now, it only contains enough data for sysfs entries and port attributes, but actual file opens go to the parent device. If you changed that, and created a full alias device, then you could change the logic like so: For rdma_dev_access_netns: return (net_eq(read_pnet(&dev->rdma_net, net) && (!(dev->flags & INIT_NET_COPY) || ib_device_shared_netns)); So now you have to both have shared netns on and be attempting the open via a default shared device in that namespace, or you have to be opening a non-default namespace specific device for this namespace. Then, when you call the netlink command with shared mode off, but not with disconnnect, all we do is unset ib_device_shared_netns and people will no longer be able to connect via a non-init_net namespace to any of the INIT_NET_COPY devices. When you call the netlink command with shared mode off, and with disconnect true, then we unset ib_device_shared_netns and we also go through and delete all of the devs with INIT_NET_COPY in their flags. Those devs need to be how the processes opened the namespace device, and we need to track enough stuff in those devs that we can pass that dev to the normal destroy function for ib devices and let it tear it down like it would a real device, taking all of the opens, pds, mrs, and everything else right along with it. What this really makes me think is that we don't want this alias device model we have now. We want full ib_device copies (which we will need for the non-default copy case anyway...if an admin wants to add an RDMA device to a new ns, and wants to control things like P_Keys allowed, then we need to be able to fully configure that device). Then we can always shut it down forcefully as needed. I really don't like the disconnect/reconnect model. There's no reason someone with a valid namespace association at the time we make this change should see anything happen. Just tear down what's invalid, and leave the rest alone. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part