On Wed, Mar 17, 2021 at 09:12:44AM +0100, Cornelia Huck wrote: > On Tue, 16 Mar 2021 14:24:54 -0600 > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > On Tue, 16 Mar 2021 07:38:09 +0000 > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > Sent: Saturday, March 13, 2021 8:56 AM > > > > > > > > The vfio_device is using a 'sleep until all refs go to zero' pattern for > > > > its lifetime, but it is indirectly coded by repeatedly scanning the group > > > > list waiting for the device to be removed on its own. > > > > > > > > Switch this around to be a direct representation, use a refcount to count > > > > the number of places that are blocking destruction and sleep directly on a > > > > completion until that counter goes to zero. kfree the device after other > > > > accesses have been excluded in vfio_del_group_dev(). This is a fairly > > > > common Linux idiom. > > > > > > > > Due to this we can now remove kref_put_mutex(), which is very rarely used > > > > in the kernel. Here it is being used to prevent a zero ref device from > > > > being seen in the group list. Instead allow the zero ref device to > > > > continue to exist in the device_list and use refcount_inc_not_zero() to > > > > exclude it once refs go to zero. > > > > > > > > This patch is organized so the next patch will be able to alter the API to > > > > allow drivers to provide the kfree. > > > > > > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > drivers/vfio/vfio.c | 79 ++++++++++++++------------------------------- > > > > 1 file changed, 25 insertions(+), 54 deletions(-) > > > > > > > > > @@ -935,32 +916,18 @@ void *vfio_del_group_dev(struct device *dev) > > > > WARN_ON(!unbound); > > > > > > > > vfio_device_put(device); > > > > - > > > > - /* > > > > - * If the device is still present in the group after the above > > > > - * 'put', then it is in use and we need to request it from the > > > > - * bus driver. The driver may in turn need to request the > > > > - * device from the user. We send the request on an arbitrary > > > > - * interval with counter to allow the driver to take escalating > > > > - * measures to release the device if it has the ability to do so. > > > > - */ > > > > > > Above comment still makes sense even with this patch. What about > > > keeping it? otherwise: > > > > The comment is not exactly correct after this code change either, the > > device will always be present in the group after this 'put'. Instead, > > the completion now indicates the reference count has reached zero. If > > it's worthwhile to keep more context to the request callback, perhaps: > > > > /* > > * If there are still outstanding device references, such as > > * from the device being in use, periodically kick the optional > > * device request callback while waiting. > > */ > > I like that comment; I don't think it hurts to be a bit verbose here. I would prefer the comment explain why the driver should return from request with refs held and what it is supposed to do on later calls. This loop mechanism is strange, I didn't look at what the drivers implement under this. I don't see this approach in other places that are able to disconnect their HW drivers from the uAPI (in RDMA land we call this disassociation) Jason