Re: [PATCH rdma-next v3 07/10] RDMA/core: Implement compat device/sysfs tree in net namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 13, 2019 at 05:35:19AM +0000, Parav Pandit wrote:
>
>
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@xxxxxxxx>
> > Sent: Saturday, January 12, 2019 11:20 AM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Doug Ledford
> > <dledford@xxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA
> > mailing list <linux-rdma@xxxxxxxxxxxxxxx>
> > Subject: Re: [PATCH rdma-next v3 07/10] RDMA/core: Implement compat
> > device/sysfs tree in net namespace
> >
> > On Sat, Jan 12, 2019 at 04:21:20AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Jason Gunthorpe <jgg@xxxxxxxx>
> > > > Sent: Friday, January 11, 2019 5:31 PM
> > > > To: Leon Romanovsky <leon@xxxxxxxxxx>
> > > > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> > > > <leonro@xxxxxxxxxxxx>; RDMA mailing list
> > > > <linux-rdma@xxxxxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx>
> > > > Subject: Re: [PATCH rdma-next v3 07/10] RDMA/core: Implement compat
> > > > device/sysfs tree in net namespace
> > > >
> > > > On Tue, Dec 18, 2018 at 02:16:01PM +0200, Leon Romanovsky wrote:
> > > > > +static __net_init int rdma_dev_init_net(struct net *net) {
> > > > > +	struct rdma_dev_net *rdma_net =	net_generic(net,
> > > > rdma_dev_net_id);
> > > > > +	struct ib_device *device;
> > > > > +
> > > > > +	INIT_LIST_HEAD(&rdma_net->compatdev_list);
> > > > > +	init_rwsem(&rdma_net->compat_rwsem);
> > > > > +
> > > > > +	/* No need to create any compat devices in init_net. */
> > > > > +	if (net_eq(net, &init_net))
> > > > > +		return 0;
> > > > > +
> > > > > +	/* Hold device mutex to synchronize with ib_register_device()
> > > > > +	 * which also tries to add compat devices.
> > > > > +	 */
> > > > > +	mutex_lock(&ib_device_mutex);
> > > > > +	/* Hold ib_lists_rwsem read lock; thereby not assume that
> > > > > +	 * ib_device_mutex is always locked while accessing ib_device_list.
> > > > > +	 */
> > > > > +	down_read(&ib_lists_rwsem);
> > > > > +	list_for_each_entry(device, &ib_device_list, core_list)
> > > > > +		rdma_compatdev_create(device, net);
> > > > > +	up_read(&ib_lists_rwsem);
> > > > > +	mutex_unlock(&ib_device_mutex);
> > > >
> > > > This is racey
> > > >
> > > No. it is race free.
> >
> > Are you kidding? I explained below
> >
> > > rdma_dev_init_net() holds ib_device_mutex and ib_register_device().
> > > mutex synchronizes it as described in Documentation and in comment in
> > init_net() above.
> > > Will discuss that in other thread where you have comment for the lock.
> >
> > The problem is that register_device can run after the above dropped the
> > device_mutex but before setup_net() puts it the new net on the list. This
> > means register_device will *not* see the new net when it does
> > for_each_net().
> >
> > Because the device_mutex and net lock are not nested, and the sequence is
> > backwards, it has a problem.
>
> Yes. I see it now.
> This means we need to keep a shadow list which register_device() can refer while holding device_mutex equivalent lock which synchronizes with init_net() callback, so that shadow list access is safe.

I'm reading it and it make me sad, instead of attempt to simplify IB/core,
we are adding more and more complexity.

resgister/unregister flows should have clear state machine with
hooks in every stage.

Thanks


>
> >
> > > >   CPU 0                                           CPU 1
> > > > setup_net()
> > > >    rdma_dev_init_net()
> > > >                                                  ib_device_register()
> > > >    list_add_tail_rcu(&net->list..)
> > > >
> > > > Only thing I can think of it is to have our own list of current
> > > > nets, with our own locking and do not use for_each_net()
> >
> > > No. I would prefer to avoid such list. Creating another list and
> > > protecting it using some different lock doesn't offer much gain.
> >
> > The gain is it allows us to nest/sequence the locks properly. The ordering
> > that setup_net() follows is backwards for what we need here.
> >
> > Jason

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux