Re: Possible race condition in the kernel between PCI driver and AER handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 08/01/2018 10:24 AM, Thomas Tai wrote:


On 08/01/2018 01:53 AM, gokul cg wrote:
Hi,

I see there is a basic design flow. As AER and PCI drivers are independent modules , locally storing pointer to any data structure from pci linked list in AER driver will create problem as there is no synchronization between the same .


https://elixir.bootlin.com/linux/v3.10.99/source/drivers/pci/pcie/aer/aerdrv_core.c#L701 Here 'structaer_err_info <https://elixir.bootlin.com/linux/v3.10.99/ident/aer_err_info>*e_info <https://elixir.bootlin.com/linux/v3.10.99/ident/e_info>' has pointer to pci dev , which can be removed from pci tree at any time .
I think this is the basic issue.

Hi Gokul,

I am afraid that I am having hard time recreating your issue. Following is the normal situation and wondering did you see any hotplug message before the aer message?

pcieport 0000:00:02.2: AER: Corrected error received: id=1130
pciehp 0000:11:06.0:pcie204: Slot(102): Link Down
pciehp 0000:11:06.0:pcie204: Slot(102): Link Down event ignored; already powering off pcieport 0000:11:06.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=1130(Receiver ID) pcieport 0000:11:06.0: device [111d:80b5] error status/mask=00000001/0000e000
pcieport 0000:11:06.0:    [ 0] Receiver Error

As far as the pci_dev being corrupted, may be you can add "slub_debug=FZP" in your kernel boot argument and rerun your test and see if it find anything. I am curious that who corrupted the pci_dev in the first place. I am not totally convinced that the problem is in the AER codes.

Cheers,
Thomas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux