Re: [PATCH] pci: Add "try" reset interfaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2014-01-07 at 16:15 -0700, Bjorn Helgaas wrote:
> Hi Alex,
> 
> Sorry for the delay in looking at this.
> 
> On Mon, Dec 16, 2013 at 3:14 PM, Alex Williamson
> <alex.williamson@xxxxxxxxxx> wrote:
> > When doing a function/slot/bus reset PCI grabs the device_lock for
> > each device to block things like suspend and driver probes, which is
> > all well and good, but call paths exist where this lock may already be
> > held.  This creates an opportunity for deadlock.  For instance, vfio
> > allows userspace to issue resets so long as it owns the device(s).
> > If a driver unbind .remove callback races with userspace issuing a
> > reset, we have a deadlock as userspace gets stuck waiting on
> > device_lock while another thread has device_lock and waits for .remove
> > to complete.
> 
> Are you talking about vfio_pci_remove() (the vfio_pci_driver .remove()
> method) racing with vfio_pci_ioctl()?

Yes, for instance if the admin does something like attempt to unbind the
device from vfio-pci while it's in use.  This can also happen indirectly
if the device is going away, such as a PF driver attempting to remove
its VFs.

> Or maybe it's vfio_pci_release (the vfio_pci_ops .release() method),
> since it looks like you want to use pci_try_reset_function() there and
> in vfio_pci_ioctl()?

That one too.  If any reset races with pci_driver.remove, whether it be
from ioctl or my own release callback, we'll hit a deadlock.

> Either way, aren't there at least potentially more locking issues than
> just the reset problem?  Seems like any ioctl that might take the
> device_lock could have the same problem.

I don't know of any others, but our QE folks hit this one pretty
regularly.  It can all be explained away as user error, but a user
should not be able to cause a kernel deadlock, which is why the user
exposed interfaces are converted to use this try-lock interface.

>   How do you make sure there's
> no userspace owner of the device before you release the device or
> remove the driver?

Userspace has a file descriptor for the device, so we know through
reference counting when the device becomes unused.  When our
pci_driver.remove() callback happens we block until the user releases
the device, so perhaps it more accurate to describe it as a nested lock
problem than a race, but the paths are asynchronous so even if we tested
the lock in advance there's a potential race.

If you have better ideas how to solve this, let me know.  Thanks,

Alex

> > To resolve this, we can make a version of the reset
> > interfaces which use trylock.  With this, we can safely attempt a
> > reset and return error to userspace if there is contention.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > ---
> >  drivers/pci/pci.c   |  155 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pci.h |    3 +
> >  2 files changed, 158 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 33120d1..de6416f 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3445,6 +3445,18 @@ static void pci_dev_lock(struct pci_dev *dev)
> >         device_lock(&dev->dev);
> >  }
> >
> > +/* Return 1 on successful lock, 0 on contention */
> > +static int pci_dev_trylock(struct pci_dev *dev)
> > +{
> > +       if (pci_cfg_access_trylock(dev)) {
> > +               if (device_trylock(&dev->dev))
> > +                       return 1;
> > +               pci_cfg_access_unlock(dev);
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static void pci_dev_unlock(struct pci_dev *dev)
> >  {
> >         device_unlock(&dev->dev);
> > @@ -3588,6 +3600,34 @@ int pci_reset_function(struct pci_dev *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(pci_reset_function);
> >
> > +/**
> > + * pci_try_reset_function - quiesce and reset a PCI device function
> > + * @dev: PCI device to reset
> > + *
> > + * Same as above, except return -EAGAIN if unable to lock device.
> > + */
> > +int pci_try_reset_function(struct pci_dev *dev)
> > +{
> > +       int rc;
> > +
> > +       rc = pci_dev_reset(dev, 1);
> > +       if (rc)
> > +               return rc;
> > +
> > +       pci_dev_save_and_disable(dev);
> > +
> > +       if (pci_dev_trylock(dev)) {
> > +               rc = __pci_dev_reset(dev, 0);
> > +               pci_dev_unlock(dev);
> > +       } else
> > +               rc = -EAGAIN;
> > +
> > +       pci_dev_restore(dev);
> > +
> > +       return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_try_reset_function);
> > +
> >  /* Lock devices from the top of the tree down */
> >  static void pci_bus_lock(struct pci_bus *bus)
> >  {
> > @@ -3612,6 +3652,32 @@ static void pci_bus_unlock(struct pci_bus *bus)
> >         }
> >  }
> >
> > +/* Return 1 on successful lock, 0 on contention */
> > +static int pci_bus_trylock(struct pci_bus *bus)
> > +{
> > +       struct pci_dev *dev;
> > +
> > +       list_for_each_entry(dev, &bus->devices, bus_list) {
> > +               if (!pci_dev_trylock(dev))
> > +                       goto unlock;
> > +               if (dev->subordinate) {
> > +                       if (!pci_bus_trylock(dev->subordinate)) {
> > +                               pci_dev_unlock(dev);
> > +                               goto unlock;
> > +                       }
> > +               }
> > +       }
> > +       return 1;
> > +
> > +unlock:
> > +       list_for_each_entry_continue_reverse(dev, &bus->devices, bus_list) {
> > +               if (dev->subordinate)
> > +                       pci_bus_unlock(dev->subordinate);
> > +               pci_dev_unlock(dev);
> > +       }
> > +       return 0;
> > +}
> > +
> >  /* Lock devices from the top of the tree down */
> >  static void pci_slot_lock(struct pci_slot *slot)
> >  {
> > @@ -3640,6 +3706,37 @@ static void pci_slot_unlock(struct pci_slot *slot)
> >         }
> >  }
> >
> > +/* Return 1 on successful lock, 0 on contention */
> > +static int pci_slot_trylock(struct pci_slot *slot)
> > +{
> > +       struct pci_dev *dev;
> > +
> > +       list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> > +               if (!dev->slot || dev->slot != slot)
> > +                       continue;
> > +               if (!pci_dev_trylock(dev))
> > +                       goto unlock;
> > +               if (dev->subordinate) {
> > +                       if (!pci_bus_trylock(dev->subordinate)) {
> > +                               pci_dev_unlock(dev);
> > +                               goto unlock;
> > +                       }
> > +               }
> > +       }
> > +       return 1;
> > +
> > +unlock:
> > +       list_for_each_entry_continue_reverse(dev,
> > +                                            &slot->bus->devices, bus_list) {
> > +               if (!dev->slot || dev->slot != slot)
> > +                       continue;
> > +               if (dev->subordinate)
> > +                       pci_bus_unlock(dev->subordinate);
> > +               pci_dev_unlock(dev);
> > +       }
> > +       return 0;
> > +}
> > +
> >  /* Save and disable devices from the top of the tree down */
> >  static void pci_bus_save_and_disable(struct pci_bus *bus)
> >  {
> > @@ -3763,6 +3860,35 @@ int pci_reset_slot(struct pci_slot *slot)
> >  }
> >  EXPORT_SYMBOL_GPL(pci_reset_slot);
> >
> > +/**
> > + * pci_try_reset_slot - Try to reset a PCI slot
> > + * @slot: PCI slot to reset
> > + *
> > + * Same as above except return -EAGAIN if the slot cannot be locked
> > + */
> > +int pci_try_reset_slot(struct pci_slot *slot)
> > +{
> > +       int rc;
> > +
> > +       rc = pci_slot_reset(slot, 1);
> > +       if (rc)
> > +               return rc;
> > +
> > +       pci_slot_save_and_disable(slot);
> > +
> > +       if (pci_slot_trylock(slot)) {
> > +               might_sleep();
> > +               rc = pci_reset_hotplug_slot(slot->hotplug, 0);
> > +               pci_slot_unlock(slot);
> > +       } else
> > +               rc = -EAGAIN;
> > +
> > +       pci_slot_restore(slot);
> > +
> > +       return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_try_reset_slot);
> > +
> >  static int pci_bus_reset(struct pci_bus *bus, int probe)
> >  {
> >         if (!bus->self)
> > @@ -3822,6 +3948,35 @@ int pci_reset_bus(struct pci_bus *bus)
> >  EXPORT_SYMBOL_GPL(pci_reset_bus);
> >
> >  /**
> > + * pci_try_reset_bus - Try to reset a PCI bus
> > + * @bus: top level PCI bus to reset
> > + *
> > + * Same as above except return -EAGAIN if the bus cannot be locked
> > + */
> > +int pci_try_reset_bus(struct pci_bus *bus)
> > +{
> > +       int rc;
> > +
> > +       rc = pci_bus_reset(bus, 1);
> > +       if (rc)
> > +               return rc;
> > +
> > +       pci_bus_save_and_disable(bus);
> > +
> > +       if (pci_bus_trylock(bus)) {
> > +               might_sleep();
> > +               pci_reset_bridge_secondary_bus(bus->self);
> > +               pci_bus_unlock(bus);
> > +       } else
> > +               rc = -EAGAIN;
> > +
> > +       pci_bus_restore(bus);
> > +
> > +       return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_try_reset_bus);
> > +
> > +/**
> >   * pcix_get_max_mmrbc - get PCI-X maximum designed memory read byte count
> >   * @dev: PCI device to query
> >   *
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 1084a15..34629df 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -951,10 +951,13 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
> >  int __pci_reset_function(struct pci_dev *dev);
> >  int __pci_reset_function_locked(struct pci_dev *dev);
> >  int pci_reset_function(struct pci_dev *dev);
> > +int pci_try_reset_function(struct pci_dev *dev);
> >  int pci_probe_reset_slot(struct pci_slot *slot);
> >  int pci_reset_slot(struct pci_slot *slot);
> > +int pci_try_reset_slot(struct pci_slot *slot);
> >  int pci_probe_reset_bus(struct pci_bus *bus);
> >  int pci_reset_bus(struct pci_bus *bus);
> > +int pci_try_reset_bus(struct pci_bus *bus);
> >  void pci_reset_bridge_secondary_bus(struct pci_dev *dev);
> >  void pci_update_resource(struct pci_dev *dev, int resno);
> >  int __must_check pci_assign_resource(struct pci_dev *dev, int i);
> >



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux