On Sun, Jan 13, 2019 at 05:35:19AM +0000, Parav Pandit wrote: > > > > -----Original Message----- > > From: Jason Gunthorpe <jgg@xxxxxxxx> > > Sent: Saturday, January 12, 2019 11:20 AM > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Doug Ledford > > <dledford@xxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA > > mailing list <linux-rdma@xxxxxxxxxxxxxxx> > > Subject: Re: [PATCH rdma-next v3 07/10] RDMA/core: Implement compat > > device/sysfs tree in net namespace > > > > On Sat, Jan 12, 2019 at 04:21:20AM +0000, Parav Pandit wrote: > > > > > > > > > > From: Jason Gunthorpe <jgg@xxxxxxxx> > > > > Sent: Friday, January 11, 2019 5:31 PM > > > > To: Leon Romanovsky <leon@xxxxxxxxxx> > > > > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > > > > <leonro@xxxxxxxxxxxx>; RDMA mailing list > > > > <linux-rdma@xxxxxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx> > > > > Subject: Re: [PATCH rdma-next v3 07/10] RDMA/core: Implement compat > > > > device/sysfs tree in net namespace > > > > > > > > On Tue, Dec 18, 2018 at 02:16:01PM +0200, Leon Romanovsky wrote: > > > > > +static __net_init int rdma_dev_init_net(struct net *net) { > > > > > + struct rdma_dev_net *rdma_net = net_generic(net, > > > > rdma_dev_net_id); > > > > > + struct ib_device *device; > > > > > + > > > > > + INIT_LIST_HEAD(&rdma_net->compatdev_list); > > > > > + init_rwsem(&rdma_net->compat_rwsem); > > > > > + > > > > > + /* No need to create any compat devices in init_net. */ > > > > > + if (net_eq(net, &init_net)) > > > > > + return 0; > > > > > + > > > > > + /* Hold device mutex to synchronize with ib_register_device() > > > > > + * which also tries to add compat devices. > > > > > + */ > > > > > + mutex_lock(&ib_device_mutex); > > > > > + /* Hold ib_lists_rwsem read lock; thereby not assume that > > > > > + * ib_device_mutex is always locked while accessing ib_device_list. > > > > > + */ > > > > > + down_read(&ib_lists_rwsem); > > > > > + list_for_each_entry(device, &ib_device_list, core_list) > > > > > + rdma_compatdev_create(device, net); > > > > > + up_read(&ib_lists_rwsem); > > > > > + mutex_unlock(&ib_device_mutex); > > > > > > > > This is racey > > > > > > > No. it is race free. > > > > Are you kidding? I explained below > > > > > rdma_dev_init_net() holds ib_device_mutex and ib_register_device(). > > > mutex synchronizes it as described in Documentation and in comment in > > init_net() above. > > > Will discuss that in other thread where you have comment for the lock. > > > > The problem is that register_device can run after the above dropped the > > device_mutex but before setup_net() puts it the new net on the list. This > > means register_device will *not* see the new net when it does > > for_each_net(). > > > > Because the device_mutex and net lock are not nested, and the sequence is > > backwards, it has a problem. > > Yes. I see it now. > This means we need to keep a shadow list which register_device() can refer while holding device_mutex equivalent lock which synchronizes with init_net() callback, so that shadow list access is safe. I'm reading it and it make me sad, instead of attempt to simplify IB/core, we are adding more and more complexity. resgister/unregister flows should have clear state machine with hooks in every stage. Thanks > > > > > > > CPU 0 CPU 1 > > > > setup_net() > > > > rdma_dev_init_net() > > > > ib_device_register() > > > > list_add_tail_rcu(&net->list..) > > > > > > > > Only thing I can think of it is to have our own list of current > > > > nets, with our own locking and do not use for_each_net() > > > > > No. I would prefer to avoid such list. Creating another list and > > > protecting it using some different lock doesn't offer much gain. > > > > The gain is it allows us to nest/sequence the locks properly. The ordering > > that setup_net() follows is backwards for what we need here. > > > > Jason
Attachment:
signature.asc
Description: PGP signature