Re: [PATCH] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

Just wanted to touch base on the status of this patch.  It resolves a deadlock I am seeing between AER and Hotplug when a controller is reset.  I have a machine that can consistently recreate this deadlock, so if there is any further investigation needed I'd be willing to help in any way I can.

Thanks,
Ian

On 2020-07-21 13:17:50 , Lukas Wunner wrote:
> Back in 2013, commits
> 
>   2e35afaefe64 ("PCI: pciehp: Add reset_slot() method")
>   608c388122c7 ("PCI: Add slot reset option to pci_dev_reset()")
> 
> introduced the callback pciehp_reset_slot() to the PCIe hotplug driver
> and amended __pci_dev_reset() (today __pci_reset_function_locked()) to
> invoke it when resetting a hotplug port's child.  The callback performs
> a Secondary Bus Reset and ensures that an ensuing link or presence flap
> is ignored by pciehp.
> 
> However the commits did not perform any locking, in particular:
> 
> * No precautions were taken to prevent concurrent execution of the new
>   callback with pciehp's IRQ handler or a sysfs request to bring the
>   slot up or down.  These code paths may see flapping link or presence
>   bits during a slot reset.
> 
> * pciehp is not prevented from unbinding while the new callback accesses
>   its struct controller.  Commit 608c388122c7 did take a reference on
>   pciehp's module, but that's not sufficient.  It only keeps pciehp's
>   code in memory, but doesn't prevent unbinding.
> 
> * In pci_dev_reset_slot_function(), commit 608c388122c7 iterates over
>   the devices on a bus without holding pci_bus_sem.
> 
> In 2018, commit
> 
>   5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
> 
> sought to address the first of these locking issues:  It introduced a
> reset_lock which serializes a slot reset with other parts of pciehp.
> 
> But Michael Haeuptle reports that deadlocks now occur on simultaneous
> hot-removal and reset of vfio devices because pciehp acquires the
> reset_lock and the device_lock in a different order than
> pci_try_reset_function():
> 
> pciehp_ist()                                    # down_read(reset_lock)
>   pciehp_handle_presence_or_link_change()
>     pciehp_disable_slot()
>       __pciehp_disable_slot()
>         remove_board()
>           pciehp_unconfigure_device()
>             pci_stop_and_remove_bus_device()
>               pci_stop_bus_device()
>                 pci_stop_dev()
>                   device_release_driver()
>                     device_release_driver_internal()
>                       __device_driver_lock()    # device_lock()
> 
> SYS_munmap()
>   vfio_device_fops_release()
>     vfio_pci_release()
>       vfio_pci_disable()
>         pci_try_reset_function()                # device_lock()
>           __pci_reset_function_locked()
>             pci_reset_hotplug_slot()
>               pciehp_reset_slot()               # down_write(reset_lock)
> 
> Ian May reports the same deadlock on simultaneous hot-removal and AER
> reset:
> 
> aer_recover_work_func()
>   pcie_do_recovery()
>     aer_root_reset()
>       pci_bus_error_reset()
>         pci_slot_reset()
>           pci_slot_lock()                       # device_lock()
>             pci_reset_hotplug_slot()
>               pciehp_reset_slot()               # down_write(reset_lock)
> 
> Fix by pushing the reset_lock out of pciehp's struct controller and into
> struct pci_slot such that it can be taken by the PCI core before taking
> the device lock.
> 
> There's a catch though:  Some drivers call __pci_reset_function_locked()
> directly and the function expects that all necessary locks, including
> the reset_lock, have been acquired by the caller.  There are callers
> which already hold the device_lock, so they can't acquire the reset_lock
> without re-introducing the AB-BA deadlock:
> 
> * drivers/net/ethernet/cavium/liquidio/lio_main.c: octeon_pci_flr()
> * drivers/xen/xen-pciback/pci_stub.c: pcistub_device_release()
> * drivers/xen/xen-pciback/pci_stub.c: pcistub_init_device() (if called
>   from pcistub_seize())
> 
> In the case of octeon_pci_flr(), the device is reset on driver unbind,
> which is why the device_lock is already held.  A possible solution might
> be to add a flag in struct pci_dev with which drivers tell the PCI core
> that the device is handed back in an unclean state and needs a reset.
> The PCI core would then perform the reset on behalf of the driver after
> it has unbound and before any new driver is bound.
> 
> As for xen, this patch (which was never applied) explains that a reset
> is performed on bind, unbind and on un-assigning a device from a guest:
> 
>   https://lore.kernel.org/patchwork/patch/848180/
> 
> The unbind code path could be solved by the same solution as for octeon
> and it may also be possible to make it work on bind, though it's unclear
> why a reset on bind is at all necessary.  The un-assigning code path is
> fixed by the present commit I think.
> 
> For now, the three functions do not acquire the reset_lock.  I'm
> inserting a lockdep_assert_held_write() so that a lockdep splat is shown
> as a reminder that liquidio and xen require fixing.
> 
> Fixes: 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
> Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Link: https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@xxxxxxxxxxxxx
> Reported-and-tested-by: Michael Haeuptle <michael.haeuptle@xxxxxxx>
> Reported-and-tested-by: Ian May <ian.may@xxxxxxxxxxxxx>
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # v4.19+
> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> ---
>  drivers/pci/hotplug/pciehp.h          |  5 -----
>  drivers/pci/hotplug/pciehp_core.c     |  4 ++--
>  drivers/pci/hotplug/pciehp_hpc.c      | 12 ++++++------
>  drivers/pci/pci.c                     | 17 +++++++++++++++++
>  drivers/pci/slot.c                    |  2 ++
>  drivers/vfio/pci/vfio_pci.c           | 19 +++++++++++++------
>  drivers/xen/xen-pciback/passthrough.c | 14 ++++++++++++--
>  drivers/xen/xen-pciback/pci_stub.c    |  6 ++++++
>  drivers/xen/xen-pciback/vpci.c        | 14 ++++++++++++--
>  include/linux/pci.h                   |  8 +++++++-
>  10 files changed, 77 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 4fd200d..676e579 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -20,7 +20,6 @@
>  #include <linux/pci_hotplug.h>
>  #include <linux/delay.h>
>  #include <linux/mutex.h>
> -#include <linux/rwsem.h>
>  #include <linux/workqueue.h>
>  
>  #include "../pcie/portdrv.h"
> @@ -69,9 +68,6 @@
>   * @button_work: work item to turn the slot on or off after 5 seconds
>   *	in response to an Attention Button press
>   * @hotplug_slot: structure registered with the PCI hotplug core
> - * @reset_lock: prevents access to the Data Link Layer Link Active bit in the
> - *	Link Status register and to the Presence Detect State bit in the Slot
> - *	Status register during a slot reset which may cause them to flap
>   * @ist_running: flag to keep user request waiting while IRQ thread is running
>   * @request_result: result of last user request submitted to the IRQ thread
>   * @requester: wait queue to wake up on completion of user request,
> @@ -102,7 +98,6 @@ struct controller {
>  	struct delayed_work button_work;
>  
>  	struct hotplug_slot hotplug_slot;	/* hotplug core interface */
> -	struct rw_semaphore reset_lock;
>  	unsigned int ist_running;
>  	int request_result;
>  	wait_queue_head_t requester;
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index bf779f2..cdb241b 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -165,7 +165,7 @@ static void pciehp_check_presence(struct controller *ctrl)
>  {
>  	int occupied;
>  
> -	down_read(&ctrl->reset_lock);
> +	down_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
>  	mutex_lock(&ctrl->state_lock);
>  
>  	occupied = pciehp_card_present_or_link_active(ctrl);
> @@ -176,7 +176,7 @@ static void pciehp_check_presence(struct controller *ctrl)
>  		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
>  
>  	mutex_unlock(&ctrl->state_lock);
> -	up_read(&ctrl->reset_lock);
> +	up_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
>  }
>  
>  static int pciehp_probe(struct pcie_device *dev)
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 53433b3..a1c9072 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -706,13 +706,17 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
>  	/*
>  	 * Disable requests have higher priority than Presence Detect Changed
>  	 * or Data Link Layer State Changed events.
> +	 *
> +	 * A slot reset may cause flaps of the Presence Detect State bit in the
> +	 * Slot Status register and the Data Link Layer Link Active bit in the
> +	 * Link Status register.  Prevent by holding the reset lock.
>  	 */
> -	down_read(&ctrl->reset_lock);
> +	down_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
>  	if (events & DISABLE_SLOT)
>  		pciehp_handle_disable_request(ctrl);
>  	else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
>  		pciehp_handle_presence_or_link_change(ctrl, events);
> -	up_read(&ctrl->reset_lock);
> +	up_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
>  
>  	ret = IRQ_HANDLED;
>  out:
> @@ -841,8 +845,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
>  	if (probe)
>  		return 0;
>  
> -	down_write(&ctrl->reset_lock);
> -
>  	if (!ATTN_BUTTN(ctrl)) {
>  		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
>  		stat_mask |= PCI_EXP_SLTSTA_PDC;
> @@ -861,7 +863,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
>  	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
>  		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
>  
> -	up_write(&ctrl->reset_lock);
>  	return rc;
>  }
>  
> @@ -925,7 +926,6 @@ struct controller *pcie_init(struct pcie_device *dev)
>  	ctrl->slot_cap = slot_cap;
>  	mutex_init(&ctrl->ctrl_lock);
>  	mutex_init(&ctrl->state_lock);
> -	init_rwsem(&ctrl->reset_lock);
>  	init_waitqueue_head(&ctrl->requester);
>  	init_waitqueue_head(&ctrl->queue);
>  	INIT_DELAYED_WORK(&ctrl->button_work, pciehp_queue_pushbutton_work);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 45c51af..455da72 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4902,6 +4902,8 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, int probe)
>  	if (!hotplug || !try_module_get(hotplug->owner))
>  		return rc;
>  
> +	lockdep_assert_held_write(&hotplug->pci_slot->reset_lock);
> +
>  	if (hotplug->ops->reset_slot)
>  		rc = hotplug->ops->reset_slot(hotplug, probe);
>  
> @@ -5110,6 +5112,8 @@ int pci_reset_function(struct pci_dev *dev)
>  	if (!dev->reset_fn)
>  		return -ENOTTY;
>  
> +	if (dev->slot)
> +		down_write(&dev->slot->reset_lock);
>  	pci_dev_lock(dev);
>  	pci_dev_save_and_disable(dev);
>  
> @@ -5117,6 +5121,8 @@ int pci_reset_function(struct pci_dev *dev)
>  
>  	pci_dev_restore(dev);
>  	pci_dev_unlock(dev);
> +	if (dev->slot)
> +		up_write(&dev->slot->reset_lock);
>  
>  	return rc;
>  }
> @@ -5169,6 +5175,9 @@ int pci_try_reset_function(struct pci_dev *dev)
>  	if (!dev->reset_fn)
>  		return -ENOTTY;
>  
> +	if (dev->slot && !down_write_trylock(&dev->slot->reset_lock))
> +		return -EAGAIN;
> +
>  	if (!pci_dev_trylock(dev))
>  		return -EAGAIN;
>  
> @@ -5176,6 +5185,8 @@ int pci_try_reset_function(struct pci_dev *dev)
>  	rc = __pci_reset_function_locked(dev);
>  	pci_dev_restore(dev);
>  	pci_dev_unlock(dev);
> +	if (dev->slot)
> +		up_write(&dev->slot->reset_lock);
>  
>  	return rc;
>  }
> @@ -5274,6 +5285,7 @@ static void pci_slot_lock(struct pci_slot *slot)
>  {
>  	struct pci_dev *dev;
>  
> +	down_write(&slot->reset_lock);
>  	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
>  		if (!dev->slot || dev->slot != slot)
>  			continue;
> @@ -5295,6 +5307,7 @@ static void pci_slot_unlock(struct pci_slot *slot)
>  			pci_bus_unlock(dev->subordinate);
>  		pci_dev_unlock(dev);
>  	}
> +	up_write(&slot->reset_lock);
>  }
>  
>  /* Return 1 on successful lock, 0 on contention */
> @@ -5302,6 +5315,9 @@ static int pci_slot_trylock(struct pci_slot *slot)
>  {
>  	struct pci_dev *dev;
>  
> +	if (!down_write_trylock(&slot->reset_lock))
> +		return 0;
> +
>  	list_for_each_entry(dev, &slot->bus->devices, bus_list) {
>  		if (!dev->slot || dev->slot != slot)
>  			continue;
> @@ -5325,6 +5341,7 @@ static int pci_slot_trylock(struct pci_slot *slot)
>  			pci_bus_unlock(dev->subordinate);
>  		pci_dev_unlock(dev);
>  	}
> +	up_write(&slot->reset_lock);
>  	return 0;
>  }
>  
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index cc386ef..e8e7d09 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -279,6 +279,8 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>  	INIT_LIST_HEAD(&slot->list);
>  	list_add(&slot->list, &parent->slots);
>  
> +	init_rwsem(&slot->reset_lock);
> +
>  	down_read(&pci_bus_sem);
>  	list_for_each_entry(dev, &parent->devices, bus_list)
>  		if (PCI_SLOT(dev->devfn) == slot_nr)
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index f634c81..260650e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -454,13 +454,20 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
>  	 * We can not use the "try" reset interface here, which will
>  	 * overwrite the previously restored configuration information.
>  	 */
> -	if (vdev->reset_works && pci_cfg_access_trylock(pdev)) {
> -		if (device_trylock(&pdev->dev)) {
> -			if (!__pci_reset_function_locked(pdev))
> -				vdev->needs_reset = false;
> -			device_unlock(&pdev->dev);
> +	if (vdev->reset_works) {
> +		if (!pdev->slot ||
> +		    down_write_trylock(&pdev->slot->reset_lock)) {
> +			if (pci_cfg_access_trylock(pdev)) {
> +				if (device_trylock(&pdev->dev)) {
> +					if (!__pci_reset_function_locked(pdev))
> +						vdev->needs_reset = false;
> +					device_unlock(&pdev->dev);
> +				}
> +				pci_cfg_access_unlock(pdev);
> +			}
> +			if (pdev->slot)
> +				up_write(&pdev->slot->reset_lock);
>  		}
> -		pci_cfg_access_unlock(pdev);
>  	}
>  
>  	pci_restore_state(pdev);
> diff --git a/drivers/xen/xen-pciback/passthrough.c b/drivers/xen/xen-pciback/passthrough.c
> index 66e9b81..98a9ec8 100644
> --- a/drivers/xen/xen-pciback/passthrough.c
> +++ b/drivers/xen/xen-pciback/passthrough.c
> @@ -89,11 +89,17 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
>  	mutex_unlock(&dev_data->lock);
>  
>  	if (found_dev) {
> -		if (lock)
> +		if (lock) {
> +			if (found_dev->slot)
> +				down_write(&found_dev->slot->reset_lock);
>  			device_lock(&found_dev->dev);
> +		}
>  		pcistub_put_pci_dev(found_dev);
> -		if (lock)
> +		if (lock) {
>  			device_unlock(&found_dev->dev);
> +			if (found_dev->slot)
> +				up_write(&found_dev->slot->reset_lock);
> +		}
>  	}
>  }
>  
> @@ -164,9 +170,13 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev)
>  	list_for_each_entry_safe(dev_entry, t, &dev_data->dev_list, list) {
>  		struct pci_dev *dev = dev_entry->dev;
>  		list_del(&dev_entry->list);
> +		if (dev->slot)
> +			down_write(&dev->slot->reset_lock);
>  		device_lock(&dev->dev);
>  		pcistub_put_pci_dev(dev);
>  		device_unlock(&dev->dev);
> +		if (dev->slot)
> +			up_write(&dev->slot->reset_lock);
>  		kfree(dev_entry);
>  	}
>  
> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
> index e876c3d..91779a2 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -463,6 +463,9 @@ static int __init pcistub_init_devices_late(void)
>  
>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
>  
> +		if (psdev->dev->slot)
> +			down_write(&psdev->dev->slot->reset_lock);
> +		device_lock(&psdev->dev->dev);
>  		err = pcistub_init_device(psdev->dev);
>  		if (err) {
>  			dev_err(&psdev->dev->dev,
> @@ -470,6 +473,9 @@ static int __init pcistub_init_devices_late(void)
>  			kfree(psdev);
>  			psdev = NULL;
>  		}
> +		device_unlock(&psdev->dev->dev);
> +		if (psdev->dev->slot)
> +			up_write(&psdev->dev->slot->reset_lock);
>  
>  		spin_lock_irqsave(&pcistub_devices_lock, flags);
>  
> diff --git a/drivers/xen/xen-pciback/vpci.c b/drivers/xen/xen-pciback/vpci.c
> index 5447b5a..d157b1d 100644
> --- a/drivers/xen/xen-pciback/vpci.c
> +++ b/drivers/xen/xen-pciback/vpci.c
> @@ -171,11 +171,17 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
>  	mutex_unlock(&vpci_dev->lock);
>  
>  	if (found_dev) {
> -		if (lock)
> +		if (lock) {
> +			if (found_dev->slot)
> +				down_write(&found_dev->slot->reset_lock);
>  			device_lock(&found_dev->dev);
> +		}
>  		pcistub_put_pci_dev(found_dev);
> -		if (lock)
> +		if (lock) {
>  			device_unlock(&found_dev->dev);
> +			if (found_dev->slot)
> +				up_write(&found_dev->slot->reset_lock);
> +		}
>  	}
>  }
>  
> @@ -216,9 +222,13 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev)
>  					 list) {
>  			struct pci_dev *dev = e->dev;
>  			list_del(&e->list);
> +			if (dev->slot)
> +				down_write(&dev->slot->reset_lock);
>  			device_lock(&dev->dev);
>  			pcistub_put_pci_dev(dev);
>  			device_unlock(&dev->dev);
> +			if (dev->slot)
> +				up_write(&dev->slot->reset_lock);
>  			kfree(e);
>  		}
>  	}
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2a2d00e..12869bd 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/resource_ext.h>
> +#include <linux/rwsem.h>
>  #include <uapi/linux/pci.h>
>  
>  #include <linux/pci_ids.h>
> @@ -65,11 +66,16 @@
>  /* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
>  #define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
>  
> -/* pci_slot represents a physical slot */
> +/**
> + * struct pci_slot - represents a physical slot
> + * @reset_lock: held for writing during a slot reset; acquire for reading to
> + *	protect access to register bits which may flap upon a reset
> + */
>  struct pci_slot {
>  	struct pci_bus		*bus;		/* Bus this slot is on */
>  	struct list_head	list;		/* Node in list of slots */
>  	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
> +	struct rw_semaphore	reset_lock;
>  	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
>  	struct kobject		kobj;
>  };
> -- 
> 2.27.0
> 



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux