Re: Race b/w pciehp and FirmwareFirst DPC->EDR flow - Reg

Vidya Sagar <vidyas@xxxxxxxxxx> · Mon, 27 Nov 2023 20:13:50 +0530

Hi Bjorn and others,
Could you please share your thoughts on this?

Thanks,
Vidya Sagar

On 11/10/2023 10:31 PM, Vidya Sagar wrote:
There was a typo. Corrected it.

s/If DPC->EDR flow checks DPC also/If DPC->EDR flow checks 'PD Change' also

On 11/10/2023 6:30 PM, Vidya Sagar wrote:
Hi folks,
Here are the platform details.
- System with a Firmware First approach for both AER and DPC
- CPERs are sent to the OS for the AER events
- DPC is notified to the OS through EDR mechanism
- System doesn't have support for in-band PD and supports only OOB PD 
where writing to a private register would set the PD state
- System has this design where PD state gets cleared whenever there is 
a link down (without even writing to the private register)

In this system, if the root port has to experience a DPC because of 
any right reasons (say SW trigger of DPC for time being), I'm 
observing the following flow which is causing a race.
1. Since DPC brings the link down, there is a DLLSC and an MSI is sent 
to the OS hence PCIe HP ISR is invoked.
2. ISR then takes stock of the Slot_Status register to see what all 
events caused the MSI generation.
3. It sees both DLLSC and PDC bits getting set.
4. PDC is set because of the aforementioned hardware design where for 
every link down, PD state gets cleared and since this is considered as 
a change in the PD state, PDC also gets set.
5. PCIe HP ISR flow transitions to the PCIe HP IST (interrupt 
thread/bottom half) and waits for the DPC_EDR to complete (because 
DLLSC is one of the events)
6. Parallel to the PCIe HP ISR/IST, DPC interrupt is raised to the 
firmware and that would then send an EDR event to the OS. Firmware 
also sets the PD state to '1' before that, as the endpoint device is 
still available.
7. Firmware programming of the private register to set the PD state 
raises second interrupt and PCIe HP ISR takes stock of the events and 
this time it is only the PDC and not DLLSC. ISR sends a wake to the 
IST, but since there is an IST in progress already, nothing much 
happens at this point.
8. Once the DPC is completed and link comes back up again, DPC 
framework asks the endpoint drivers to handle it by calling the 
'pci_error_handlers' callabacks registered by the endpoint device 
drivers.
9. The PCIe HP IST (of the very first MSI) resumes after receiving the 
completion from the DPC framework saying that DPC recovery is successful.
10. Since PDC (Presence Detect Change) bit is also set for the first 
interrupt, IST attempts to remove the devices (as part of 
pciehp_handle_presence_or_link_change())

At this point, there is a race between the device driver that is 
trying to work with the device (through pci_error_handlers callback) 
and the IST that is trying to remove the device.
To be fair to pciehp_handle_presence_or_link_change(), after removing 
the devices, it checks for the link-up/PD being '1' and scans the 
devices again if the device is still available. But unfortunately, IST 
is deadlocked (with the device driver) while removing the devices 
itself and won't go to the next step.

The rationale given in the form of a comment in 
pciehp_handle_presence_or_link_change() for removing the devices first 
(without checking PD/link-up) and adding them back after checking 
link-up/PD is that, since there is a change in PD, the new link-up 
need not be with the same old device rather it could be with a 
different device. So, it justifies in removing the existing devices 
and then scanning for new ones. But this is causing deadlock with the 
already initiated DPC recovery process.

I see two ways to avoid the deadlock in this scenario.
1. When PCIe HP IST is looking at both DLLSC and PDC, why should 
DPC->EDR flow look at only DLLSC and not DPC? If DPC->EDR flow checks 
'PD Change' also (along with DLL) and declares that the DPC recovery 
can't happen (as there could be a change of device) hence returning 
DPC recovery failure, then, the device driver's pci_error_handlers 
callbacks won't be called and there won't be any deadlock.
2. Check for the possibility of a common lock so that PCIe HP IST and 
device driver's pci_error_handlers callbacks don't race.

Let me know your comments on this.

Thanks,
Vidya Sagar