On Thu, Sep 22, 2022 at 01:10:50PM -0600, Alex Williamson wrote: > > @@ -378,13 +394,20 @@ static struct vfio_group *vfio_get_group(struct iommu_group *iommu_group, > > > > mutex_lock(&vfio.group_lock); > > > > - ret = __vfio_group_get_from_iommu(iommu_group); > > - if (ret) > > - goto err_unlock; > > + ret = vfio_group_find_from_iommu(iommu_group); > > + if (ret) { > > + if (WARN_ON(vfio_group_has_device(ret, dev))) { > > + ret = ERR_PTR(-EINVAL); > > + goto out_unlock; > > + } > > This still looks racy. We only know within vfio_group_has_device() that > the device is not present in the group, what prevents a race between > here and when we finally do add it to group->device_list? This is a condition which is defined to be impossible and by auditing I've checked it can't happen. There is no race in the sense that this can't actually happen, if it does happen it means the group is corrupted. At that point reasoning about locks and such goes out the window too - eg it might be corrupted because of bad locking. When it comes to self-debugging tests we often have these "races", eg in the destroy path we do: WARN_ON(!list_empty(&group->device_list)); Which doesn't hold the appropriate locks either. The point is just to detect that group has been corrupted at a point in time in hopes of guarding against a future kernel bug. > The semantics of vfio_get_group() are also rather strange, 'return a > vfio_group for this iommu_group, but make sure it doesn't include this > device' :-\ Thanks, I think of it as "return a group and also do sanity checks that the returned group has not been corrupted" I don't like the name of this function but couldn't figure a better one. It is something like "find or create a group for a device which we know doesn't already have a group" Jason