On Tue, 16 Mar 2021 14:24:54 -0600 Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Tue, 16 Mar 2021 07:38:09 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Saturday, March 13, 2021 8:56 AM > > > > > > The vfio_device is using a 'sleep until all refs go to zero' pattern for > > > its lifetime, but it is indirectly coded by repeatedly scanning the group > > > list waiting for the device to be removed on its own. > > > > > > Switch this around to be a direct representation, use a refcount to count > > > the number of places that are blocking destruction and sleep directly on a > > > completion until that counter goes to zero. kfree the device after other > > > accesses have been excluded in vfio_del_group_dev(). This is a fairly > > > common Linux idiom. > > > > > > Due to this we can now remove kref_put_mutex(), which is very rarely used > > > in the kernel. Here it is being used to prevent a zero ref device from > > > being seen in the group list. Instead allow the zero ref device to > > > continue to exist in the device_list and use refcount_inc_not_zero() to > > > exclude it once refs go to zero. > > > > > > This patch is organized so the next patch will be able to alter the API to > > > allow drivers to provide the kfree. > > > > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > --- > > > drivers/vfio/vfio.c | 79 ++++++++++++++------------------------------- > > > 1 file changed, 25 insertions(+), 54 deletions(-) > > > > > > @@ -935,32 +916,18 @@ void *vfio_del_group_dev(struct device *dev) > > > WARN_ON(!unbound); > > > > > > vfio_device_put(device); > > > - > > > - /* > > > - * If the device is still present in the group after the above > > > - * 'put', then it is in use and we need to request it from the > > > - * bus driver. The driver may in turn need to request the > > > - * device from the user. We send the request on an arbitrary > > > - * interval with counter to allow the driver to take escalating > > > - * measures to release the device if it has the ability to do so. > > > - */ > > > > Above comment still makes sense even with this patch. What about > > keeping it? otherwise: > > The comment is not exactly correct after this code change either, the > device will always be present in the group after this 'put'. Instead, > the completion now indicates the reference count has reached zero. If > it's worthwhile to keep more context to the request callback, perhaps: > > /* > * If there are still outstanding device references, such as > * from the device being in use, periodically kick the optional > * device request callback while waiting. > */ I like that comment; I don't think it hurts to be a bit verbose here. > > It's also a little obvious that's what we're doing here even without > the comment. Thanks, > > Alex