Hi Vidya, sorry for the delay, still catching up on e-mails after Plumbers... On Fri, Nov 10, 2023 at 10:31:55PM +0530, Vidya Sagar wrote: > > - System doesn't have support for in-band PD and supports only OOB PD > > where writing to a private register would set the PD state We already have an inband_presence_disabled flag in struct controller which is set if the In-Band PD Disable Supported bit in the Slot Capabilities 2 Register is set. The flag may also be set through the inband_presence_disabled_dmi_table[]. Currently the only place where the flag makes a difference is on slot bringup: pciehp_check_link_status() doesn't wait for the Presence Detect Status bit to become set. I'm wondering if we need to also disregard PDC events if In-Band PD is disabled. Not sure if the behavior you're seeing is caused by a quirk of the hardware or is expected if In-Band PD is disabled. Probably the former. A code change would generally only be acceptable in the latter case though I think. > > 10. Since PDC (Presence Detect Change) bit is also set for the first > > interrupt, IST attempts to remove the devices (as part of > > pciehp_handle_presence_or_link_change()) > > > > At this point, there is a race between the device driver that is > > trying to work with the device (through pci_error_handlers callback) > > and the IST that is trying to remove the device. > > To be fair to pciehp_handle_presence_or_link_change(), after removing > > the devices, it checks for the link-up/PD being '1' and scans the > > devices again if the device is still available. But unfortunately, > > IST is deadlocked (with the device driver) while removing the devices > > itself and won't go to the next step. Could you provide stacktraces of the two deadlocked tasks? Right now I don't quite understand why they're deadlocked. Are you getting hung task messages in dmesg? They should include stacktraces. Also, which kernel version are we talking about? Thanks, Lukas