On Tue, Aug 03, 2021 at 10:34:06AM -0600, Alex Williamson wrote: > On Wed, 28 Jul 2021 21:49:18 -0300 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > Keep track of all the vfio_devices that have been added to the device set > > and use this list in vfio_pci_try_bus_reset() instead of trying to work > > backwards from the pci_device. > > > > The dev_set->lock directly prevents devices from joining/leaving the set, > > which further implies the pci_device cannot change drivers or that the > > vfio_device be freed, eliminating the need for get/put's. > > > > Completeness of the device set can be directly measured by checking if > > every PCI device in the reset group is also in the device set - which > > proves that VFIO drivers are attached to everything. > > > > This restructuring corrects a call to pci_dev_driver() without holding the > > device_lock() and removes a hard wiring to &vfio_pci_driver. > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > drivers/vfio/pci/vfio_pci.c | 148 +++++++++++++++--------------------- > > 1 file changed, 62 insertions(+), 86 deletions(-) > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > index 5d6db93d6c680f..a1ae9a83a38621 100644 > > +++ b/drivers/vfio/pci/vfio_pci.c > > @@ -404,6 +404,9 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) > > struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; > > int i, bar; > > > > + /* For needs_reset */ > > + lockdep_assert_held(&vdev->vdev.dev_set->lock); > > + > > /* Stop the device from further DMA */ > > pci_clear_master(pdev); > > > > @@ -2145,7 +2148,7 @@ static struct pci_driver vfio_pci_driver = { > > .err_handler = &vfio_err_handlers, > > }; > > > > -static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) > > +static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) > > { > > struct vfio_devices *devs = data; > > struct vfio_device *device; > > @@ -2165,8 +2168,11 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) > > > > vdev = container_of(device, struct vfio_pci_device, vdev); > > > > - /* Fault if the device is not unused */ > > - if (device->open_count) { > > + /* > > + * Locking multiple devices is prone to deadlock, runaway and > > + * unwind if we hit contention. > > + */ > > + if (!vfio_pci_zap_and_vma_lock(vdev, true)) { > > vfio_device_put(device); > > return -EBUSY; > > } > > @@ -2175,112 +2181,82 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) > > return 0; > > } > > > > -static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) > > +static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data) > > { > > - struct vfio_devices *devs = data; > > - struct vfio_device *device; > > - struct vfio_pci_device *vdev; > > + struct vfio_device_set *dev_set = data; > > + struct vfio_device *cur; > > > > - if (devs->cur_index == devs->max_index) > > - return -ENOSPC; > > + lockdep_assert_held(&dev_set->lock); > > > > - device = vfio_device_get_from_dev(&pdev->dev); > > - if (!device) > > - return -EINVAL; > > - > > - if (pci_dev_driver(pdev) != &vfio_pci_driver) { > > - vfio_device_put(device); > > - return -EBUSY; > > - } > > - > > - vdev = container_of(device, struct vfio_pci_device, vdev); > > + list_for_each_entry(cur, &dev_set->device_list, dev_set_list) > > + if (cur->dev == &pdev->dev) > > + return 0; > > + return -EBUSY; > > +} > > > > - /* > > - * Locking multiple devices is prone to deadlock, runaway and > > - * unwind if we hit contention. > > - */ > > - if (!vfio_pci_zap_and_vma_lock(vdev, true)) { > > - vfio_device_put(device); > > - return -EBUSY; > > +/* > > + * vfio-core considers a group to be viable and will create a vfio_device even > > + * if some devices are bound to drivers like pci-stub or pcieport. Here we > > + * require all PCI devices to be inside our dev_set since that ensures they stay > > + * put and that every driver controlling the device can co-ordinate with the > > + * device reset. > > + */ > > +static struct pci_dev *vfio_pci_find_reset_target(struct vfio_pci_device *vdev) > > +{ > > + struct vfio_device_set *dev_set = vdev->vdev.dev_set; > > + struct vfio_pci_device *cur; > > + bool needs_reset = false; > > + > > + /* No VFIO device has an open device FD */ > > + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { > > + if (cur->vdev.open_count) > > + return NULL; > > + needs_reset |= cur->needs_reset; > > } > > + if (!needs_reset) > > + return NULL; > > > > - devs->devices[devs->cur_index++] = vdev; > > - return 0; > > + /* All PCI devices in the group to be reset need to be in our dev_set */ > > + if (vfio_pci_for_each_slot_or_bus( > > + vdev->pdev, vfio_pci_is_device_in_set, dev_set, > > + !pci_probe_reset_slot(vdev->pdev->slot))) > > + return NULL; > > + return cur->pdev; > > > I don't understand the "reset target" aspect of this, cur->pdev is > simply the last entry in the dev_set->devices_list... Oh, hum, this got messed up someplace along the way, the original code was just: /* Does at least one need a reset? */ for (i = 0; i < devs.cur_index; i++) { tmp = devs.devices[i]; if (tmp->needs_reset) { ret = pci_reset_bus(vdev->pdev); break; So should this, I'll fix it up, thanks > I think the vfio_pci_find_reset_target() function needs to be re-worked > to just tell us true/false that it's ok to reset the provided device, > not to anoint an arbitrary target device. Thanks, Yes, though this logic is confusing, why do we need to check if any device needs a reset at this point? If we are being asked to reset vdev shouldn't vdev needs_reset? Or is the function more of a 'synchronize pending reset' kind of thing? Jason