RE: [PATCH rdma-next 5/5] RDMA/core: Add command to set ib_core device net namspace sharing mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Doug Ledford <dledford@xxxxxxxxxx>
> Sent: Wednesday, February 20, 2019 12:42 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jason Gunthorpe
> <jgg@xxxxxxxxxxxx>
> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 5/5] RDMA/core: Add command to set
> ib_core device net namspace sharing mode
> 
> On Wed, 2019-02-20 at 18:03 +0000, Parav Pandit wrote:
> > > -----Original Message-----
> > > From: Jason Gunthorpe
> > > Sent: Wednesday, February 20, 2019 11:56 AM
> > > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> > > <leon@xxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA
> > > mailing list <linux-rdma@xxxxxxxxxxxxxxx>
> > > Subject: Re: [PATCH rdma-next 5/5] RDMA/core: Add command to set
> > > ib_core device net namspace sharing mode
> > >
> > > On Wed, Feb 20, 2019 at 10:52:16AM -0700, Parav Pandit wrote:
> > >
> > > > Yes. we have the module parameter option in this series.
> > > > I came across a user who didn't have LOM nics.
> > > > They are directly using rdma nics in their cluster as primary and
> > > > only interface.
> > >
> > > This is very common for IB clusters, a dedicated ethernet management
> > > network is a very expensive component at large scale.
> > >
> > > > I do not know if such IB based networks exist.  And if they do,
> > > > when they change this mode, they will have connectivity loss.
> > >
> > > Or they have to change modes before setting up ipoib. It is much less
> useful.
> > >
> > > > So we probably shouldn't be doing client unregister-register
> > > > sequence as part of this sys operation done by advance user.
> > >
> > > Provide a 'rdma ulp-restart' netlink command that does the
> > > enable/disable sequence?
> > >
> > Probably we should define more generic rdma dev up/down (start/stop)
> API that network manager sw can consume in sw.
> 
> I was thinking more along the lines of trying to change the compat dev
> structure.  Right now, it only contains enough data for sysfs entries and port
> attributes, but actual file opens go to the parent device.  If you changed
> that, and created a full alias device, then you could change the logic like so:
> 
> For rdma_dev_access_netns:
> 	return (net_eq(read_pnet(&dev->rdma_net, net) &&
> 		(!(dev->flags & INIT_NET_COPY) ||
> 		  ib_device_shared_netns));
> 
> So now you have to both have shared netns on and be attempting the open
> via a default shared device in that namespace, or you have to be opening a
> non-default namespace specific device for this namespace.
> 
> Then, when you call the netlink command with shared mode off, but not
> with disconnnect, all we do is unset ib_device_shared_netns and people will
> no longer be able to connect via a non-init_net namespace to any of the
> INIT_NET_COPY devices.
> 
> When you call the netlink command with shared mode off, and with
> disconnect true, then we unset ib_device_shared_netns and we also go
> through and delete all of the devs with INIT_NET_COPY in their flags.
> Those devs need to be how the processes opened the namespace device,
> and we need to track enough stuff in those devs that we can pass that dev
> to the normal destroy function for ib devices and let it tear it down like it
> would a real device, taking all of the opens, pds, mrs, and everything else
> right along with it.
> 
> What this really makes me think is that we don't want this alias device
> model we have now.  We want full ib_device copies (which we will need for
> the non-default copy case anyway...if an admin wants to add an RDMA
> device to a new ns, and wants to control things like P_Keys allowed, then we
> need to be able to fully configure that device).  Then we can always shut it
> down forcefully as needed.
> 
> I really don't like the disconnect/reconnect model.  There's no reason
> someone with a valid namespace association at the time we make this
> change should see anything happen.  Just tear down what's invalid, and
> leave the rest alone.
> 
I think we should create full device copies using rdma dev add/del commands like siw, rxe.
This will have right control knobs and clean interface and it fits to per ns mode too.
So that we don't keep investing in shared mode to make things work. It really complicates.

Other option that Jason proposed on top of these three series is to selectively add rdma devices to particular namespaces instead of share-all model.

So for both the options (b) selective addition or (a) creating per net namespace rdma devices (either via vendor or via sw) fits per ns model better than creating bigger copies and managing them.


> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux