在 2018/4/10 6:04, Keith Busch 写道:
AER error handling walks the PCI topology below a root port, saving pointers of the pci_dev structs affected by the error along the way. At the same time, the pcie hotplug driver could be freeing those very same structures, causing the AER driver to reference freed memory.
I also have met such issue. The details see the below link. https://www.spinics.net/lists/linux-pci/msg70614.html It seems do reset_link() will trigger hotplug driver to remove/rescan device when met AER fatal error. do_recovery() --- reset_link() --- pci_reset_secondary_bus() //then trigger link down/up. So if the root port support hotplug, it will remove/rescan device. I still have a question, since AER driver have already done recovery, it seems no need to trigger hotplug to remove and rescan the device. Thanks, Dongdong
This series fixes this by synchroniziing the aer driver with the pci hotplug driver during. The final patch is the one that ultimately provides the locking by having AER lock the same pci lock rescan/remove mutex as the pciehp driver. The first three patches are prepping to make it safe for the aer bottom half handler to hold that lock. Keith Busch (4): PCI/AER: Remove unused parameters PCI/AER: Replace struct pcie_device with pci_dev PCI/AER: Reference count aer structures PCI/AER: Lock pci topology when scanning errors drivers/pci/pcie/aer/aerdrv.c | 28 +++++++++++++++++++++------- drivers/pci/pcie/aer/aerdrv.h | 9 +++------ drivers/pci/pcie/aer/aerdrv_core.c | 38 +++++++++++++++++--------------------- 3 files changed, 41 insertions(+), 34 deletions(-)