On Mon, 4 Oct 2021 19:34:31 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Oct 04, 2021 at 04:25:32PM -0600, Alex Williamson wrote: > > On Fri, 1 Oct 2021 20:22:20 -0300 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > iommu_group_register_notifier()/iommu_group_unregister_notifier() are > > > built using a blocking_notifier_chain which integrates a rwsem. The > > > notifier function cannot be running outside its registration. > > > > > > When considering how the notifier function interacts with create/destroy > > > of the group there are two fringe cases, the notifier starts before > > > list_add(&vfio.group_list) and the notifier runs after the kref > > > becomes 0. > > > > > > Prior to vfio_create_group() unlocking and returning we have > > > container_users == 0 > > > device_list == empty > > > And this cannot change until the mutex is unlocked. > > > > > > After the kref goes to zero we must also have > > > container_users == 0 > > > device_list == empty > > > > > > Both are required because they are balanced operations and a 0 kref means > > > some caller became unbalanced. Add the missing assertion that > > > container_users must be zero as well. > > > > > > These two facts are important because when checking each operation we see: > > > > > > - IOMMU_GROUP_NOTIFY_ADD_DEVICE > > > Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev() > > > 0 container_users ends the call > > > - IOMMU_GROUP_NOTIFY_BOUND_DRIVER > > > 0 container_users ends the call > > > > > > Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes > > > items from the unbound list. During creation this list is empty, during > > > kref == 0 nothing can read this list, and it will be freed soon. > > > > > > Since the vfio_group_release() doesn't hold the appropriate lock to > > > manipulate the unbound_list and could race with the notifier, move the > > > cleanup to directly before the kfree. > > > > > > This allows deleting all of the deferred group put code. > > > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > drivers/vfio/vfio.c | 89 +++++---------------------------------------- > > > 1 file changed, 9 insertions(+), 80 deletions(-) > > > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > > > index 08b27b64f0f935..32a53cb3598524 100644 > > > +++ b/drivers/vfio/vfio.c > > > @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container) > > > > > > static void vfio_group_unlock_and_free(struct vfio_group *group) > > > { > > > + struct vfio_unbound_dev *unbound, *tmp; > > > + > > > mutex_unlock(&vfio.group_lock); > > > /* > > > * Unregister outside of lock. A spurious callback is harmless now > > > * that the group is no longer in vfio.group_list. > > > */ > > > > This comment is indirectly referencing the vfio_group_try_get() in the > > notifier callback, but as you describe in the commit log, it's actually > > the container_users value that prevents this from racing group release > > now. Otherwise, tricky but looks good. Thanks, > > Do you think the comment should be deleted in this commit? I think I > got it later on.. I think the commit log argument is that notifies racing the group release are harmless so long as the container is unused, and releasing a group with active container users would be unbalanced, which justifies the WARN_ON added here. This comment does get removed in the final patch, but maybe that logic can be noted in a comment by the new sanity test. Thanks, Alex