On 11/8/18 2:09 PM, Bjorn Helgaas wrote: > > [EXTERNAL EMAIL] > Please report any suspicious attachments, links, or requests for sensitive information. > > > [+cc Jonathan, Greg, Lukas, Russell, Sam, Oliver for discussion about > PCI error recovery in general] > > On Wed, Nov 07, 2018 at 05:42:57PM -0600, Bjorn Helgaas wrote: >> On Tue, Sep 18, 2018 at 05:15:00PM -0500, Alexandru Gagniuc wrote: >>> When a PCI device is gone, we don't want to send IO to it if we can >>> avoid it. We expose functionality via the irq_chip structure. As >>> users of that structure may not know about the underlying PCI device, >>> it's our responsibility to guard against removed devices. >>> >>> .irq_write_msi_msg() is already guarded inside __pci_write_msi_msg(). >>> .irq_mask/unmask() are not. Guard them for completeness. >>> >>> For example, surprise removal of a PCIe device triggers teardown. This >>> touches the irq_chips ops some point to disable the interrupts. I/O >>> generated here can crash the system on firmware-first machines. >>> Not triggering the IO in the first place greatly reduces the >>> possibility of the problem occurring. >>> >>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@xxxxxxxxx> >> >> Applied to pci/misc for v4.21, thanks! > > I'm having second thoughts about this. Do we have a verdict on this? If you don't like this approach, then I'll have to fix the problem in some other way, but the problem still needs to be fixed. Alex