On 8/21/2018 10:37 AM, Keith Busch wrote:
On Tue, Aug 21, 2018 at 04:06:30PM +1000, Benjamin Herrenschmidt wrote:
On Tue, 2018-08-21 at 10:44 +0530, poza@xxxxxxxxxxxxxx wrote:
Ok Let me summarize the so far discussed things.
It would be nice if we all (Bjorn, Keith, Ben, Sinan) can hold consensus
on this.
1) Right now AER and DPC both calls pcie_do_fatal_recovery(), I majorly
see DPC as error handling and recovery agent rather than being used for
hotplug.
so in my opinion, both AER and DPC should have same error handling
and recovery mechanism
Yes.
so if there is a way to figure out that in absence of pcihp, if DPC
is being used to support hotplug then we fall back to original DPC
mechanism (which is remove devices)
Not exactly. If the presence detect change indicates it was a hotplug
event rather.
The actions associated with error recovery will trigger link state changes
for a lot of existing hardware. PCIEHP currently does the same removal
sequence for both link state change (DLLSC) and presence detect change
(PDC) events.
It sounds like you want pciehp to do nothing on the DLLSC events that it
currently handles, and instead do the board removal only on PDC. If that
is the case, is the desire to not remove devices downstream a permanently
disabled link, or does that responsibility fall onto some other component?
Looking at PDC is not enough. Hotplug driver handles both physical
removal as well as the link going down due to signal integrity issues today.
If the link went down because of a pending FATAL error, AER/DPC recovers
the link automatically. There is no need for hotplug to be involved in
fatal error work.
Hotplug driver needs to handle both physical removal as well as
intermittent link down issues though.