RE: Deadlock during PCIe hot remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Folks, it seems that I can avoid the deadlock by replacing the down_write in pciehp_reset_slot with a down_write_trylock.

What would be the down side of doing that? As a reminder, vfio ultimately calls this function when it disables a vfio device.

int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
{
	struct controller *ctrl = to_ctrl(hotplug_slot);
	struct pci_dev *pdev = ctrl_dev(ctrl);
	u16 stat_mask = 0, ctrl_mask = 0;
	int rc;

	if (probe)
		return 0;

	// original:
	// down_write(&ctrl->reset_lock);
	// change:
	if (!down_write_trylock(&ctrl->reset_lock)) {
		return -EAGAIN; // ????? or just 0?
	}
    
	if (!ATTN_BUTTN(ctrl)) {
		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
		stat_mask |= PCI_EXP_SLTSTA_PDC;
	}
	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
	stat_mask |= PCI_EXP_SLTSTA_DLLSC;

	pcie_write_cmd(ctrl, 0, ctrl_mask);
	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);

	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);

	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);

	up_write(&ctrl->reset_lock);

	return rc;
}

-----Original Message-----
From: Christoph Hellwig [mailto:hch@xxxxxx] 
Sent: Wednesday, March 25, 2020 4:40 AM
To: Lukas Wunner <lukas@xxxxxxxxx>
Cc: Haeuptle, Michael <michael.haeuptle@xxxxxxx>; Christoph Hellwig <hch@xxxxxx>; linux-pci@xxxxxxxxxxxxxxx; michaelhaeuptle@xxxxxxxxx
Subject: Re: Deadlock during PCIe hot remove

On Tue, Mar 24, 2020 at 05:15:34PM +0100, Lukas Wunner wrote:
> The pci_dev_trylock() in pci_try_reset_function() looks questionable 
> to me.  It was added by commit b014e96d1abb ("PCI: Protect
> pci_error_handlers->reset_notify() usage with device_lock()") with the 
> following rationale:
> 
>     Every method in struct device_driver or structures derived from it like
>     struct pci_driver MUST provide exclusion vs the driver's ->remove()
>     method, usually by using device_lock().
>     [...]
>     Without this, ->reset_notify() may race with ->remove() calls, which
>     can be easily triggered in NVMe.
> 
> The intersection of drivers defining a ->reset_notify() hook and files 
> invoking pci_try_reset_function() appears to be empty.  So I don't 
> quite understand the problem the commit sought to address.  What am I missing?

No driver defines ->reset_notify as that has been split into
->reset_prepare and ->reset_done a while ago, and plenty of drivers
define those.  And we can't call into drivers unless we know the driver actually still is bound to the device, which is why we need the locking.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux