> On Oct 9, 2020, at 11:07 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Fri, Oct 09, 2020 at 11:00:22AM -0400, Chuck Lever wrote: >> >> >>> On Oct 9, 2020, at 10:57 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: >>> >>> On Fri, Oct 09, 2020 at 10:48:55AM -0400, Chuck Lever wrote: >>>> Hi Jason- >>>> >>>>> On Oct 9, 2020, at 10:39 AM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: >>>>> >>>>> On Fri, Oct 09, 2020 at 12:49:30PM +0800, Ka-Cheong Poon wrote: >>>>>> As I mentioned before, this is a very serious restriction on how >>>>>> the RDMA subsystem can be used in a namespace environment by kernel >>>>>> module. The reason given for this restriction is that any kernel >>>>>> socket without a corresponding user space file descriptor is "rogue". >>>>>> All Internet protocol code create a kernel socket without user >>>>>> interaction. Are they all "rogue"? >>>>> >>>>> You should work with Chuck to make NFS use namespaces properly and >>>>> then you can propose what changes might be needed with a proper >>>>> justification. >>>> >>>> The NFS server code already uses namespaces for creating listener >>>> endpoints, already has a user space component that drives the >>>> creation of listeners, and already passes an appropriate struct >>>> net to rdma_create_id. As far as I am aware, it is namespace-aware >>>> and -friendly all the way down to rdma_create_id(). >>>> >>>> What more needs to be done? >>> >>> I have no idea, if you are able to pass a namespace all the way down >>> to the listening cm_id and everything works right (I'm skeptical) then >>> there is nothing more to worry about - why are we having this thread? >> >> The thread is about RDS, not NFS. NFS has some useful examples to >> crib, but it's not the main point. >> >> I don't think NFS/RDMA namespacing works today, but it's not because >> NFS isn't ready. I agree that is another thread. > > Exactly, so instead of talking about RDS stuff without any patches, Roger that. Maybe Ka-Cheong and team can propose some patches to help the discussion along. > let's talk about NFS with patches - if you can make NFS work then I > assume RDS will be happy. Perhaps not a valid assumption :-) NFS is a traditional client-server model, and has a user space tool that drives the creation of endpoints, just as you expect. With RDS, listener endpoints are not visible in user space. They are a globally-managed shared resource, more like network interfaces than listener sockets. Therefore I think the approach is going to be "one RDS listener per net namespace". The problem Ka-Cheong is trying to address is how to manage the destruction of a listener-namespace pair. The extra reference count on the cm_id is pinning the namespace so it cannot be destroyed. > NFS has an established model for using namespaces that the other > transports uses, so I'd rather focus on this. Understood, but it doesn't seem like there is enough useful overlap between the NFS and RDS usage scenarios. With NFS, I would expect an explicit listener shutdown from userland prior to namespace destruction. >>>>> The rules for lifetime on IB clients are tricky, and the interaction >>>>> with namespaces makes it all a lot more murky. >>>> >>>> I think what Ka-cheong is asking is for a detailed explanation of >>>> these lifetime rules so we can understand why rdma_create_id bumps >>>> the namespace reference count. >>> >>> It is because the CM has no code to revoke a CM ID before the >>> namespace goes away and the pointer becomes invalid. >> >> Is it just a question of "no-one has yet written this code" or is >> there a deeper technical reason why this has not been done? > > It is hard to know without spending a big deep look at this > stuff. Fair enough. -- Chuck Lever chucklever@xxxxxxxxx