> -----Original Message----- > From: Cornelia Huck <cohuck@xxxxxxxxxx> > Sent: Friday, November 8, 2019 7:01 AM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: alex.williamson@xxxxxxxxxx; davem@xxxxxxxxxxxxx; > kvm@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Saeed Mahameed > <saeedm@xxxxxxxxxxxx>; kwankhede@xxxxxxxxxx; leon@xxxxxxxxxx; Jiri > Pirko <jiri@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx > Subject: Re: [PATCH net-next 11/19] vfio/mdev: Improvise mdev life cycle and > parent removal scheme > > On Thu, 7 Nov 2019 10:08:26 -0600 > Parav Pandit <parav@xxxxxxxxxxxx> wrote: > > I guess that should be s/Improvise/improve/ in $SUBJECT, no? > Will do. [..] > > > > To summarize, > > mdev_remove() > > read locks -> unreg_sem [ lock-A ] > > [..] > > devlink_unregister(); > > mutex lock devlink_mutex [ lock-B ] > > > > devlink eswitch->switchdev-legacy mode change. > > devlink_nl_cmd_eswitch_set_doit() > > mutex lock devlink_mutex [ lock-B ] > > mdev_unregister_device() > > write locks -> unreg_sem [ lock-A] > > So, this problem starts to pop up once you hook up that devlink stuff with > the mdev stuff, and previous users of mdev just did not have a locking > scheme similar to devlink? > Correct. > > > > Hence, instead of using semaphore, such synchronization is achieved > > using srcu which is more flexible that eliminates nested locking. > > > > SRCU based solution is already proposed before at [2]. > > > > [1] commit 5715c4dd66a3 ("vfio/mdev: Synchronize device create/remove > > with parent removal") [2] > > https://lore.kernel.org/patchwork/patch/1055254/ > > I don't quite recall the discussion there... is this a rework of a patch you > proposed before? Confused. > It was one huge patch, fixing multiple issues. Alex suggested to split into multiple. Initially for this issue I had it srcu, while redoing them to smaller patches, I guess for simplicity I moved to semaphore. Once I enabled all my tested after a break, I realized that fix is not enough. > > > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > --- > > drivers/vfio/mdev/mdev_core.c | 56 +++++++++++++++++++++++--------- > > drivers/vfio/mdev/mdev_private.h | 3 +- > > 2 files changed, 43 insertions(+), 16 deletions(-) > > (...) > > > @@ -207,6 +207,7 @@ int mdev_register_device(struct device *dev, const > struct mdev_parent_ops *ops) > > dev_warn(dev, "Failed to create compatibility class link\n"); > > > > list_add(&parent->next, &parent_list); > > + rcu_assign_pointer(parent->self, parent); > > mutex_unlock(&parent_list_lock); > > > > dev_info(dev, "MDEV: Registered\n"); @@ -250,14 +251,29 @@ void > > mdev_unregister_device(struct device *dev) > > list_del(&parent->next); > > mutex_unlock(&parent_list_lock); > > > > - down_write(&parent->unreg_sem); > > + /* > > + * Publish that this mdev parent is unregistering. So any new > > + * create/remove cannot start on this parent anymore by user. > > + */ > > + rcu_assign_pointer(parent->self, NULL); > > + > > + /* > > + * Wait for any active create() or remove() mdev ops on the parent > > + * to complete. > > + */ > > + synchronize_srcu(&parent->unreg_srcu); > > + > > + /* > > + * At this point it is confirmed that any pending user initiated > > + * create or remove callbacks accessing the parent are completed. > > + * It is safe to remove the parent now. > > + */ > > So, you're putting an srcu-handled self reference there and use that as an > indication whether the parent is unregistering? > Right.