On 8/20/2018 6:04 PM, Benjamin Herrenschmidt wrote:
On Mon, 2018-08-20 at 18:02 -0400, Sinan Kaya wrote:
On 8/20/2018 5:53 PM, Benjamin Herrenschmidt wrote:
Hotplug driver removes the devices on link down events and re-enumerates
on insertion.
I am trying to separate fatal error handling from hotplug.
I'll try to take a look. We can't always count on pciehp to do the
removal when a removal occurs, though. The PCIe specification contains
an implementation note that DPC may be used in place of hotplug surprise.
Can't you use the presence detect to differenciate ?
Also, I don't have the specs at hand right now, but does the hotplug
brigde have a way to "latch' the change in presence detect so we can
see if it has transitioned even if it's back on ?
There is only presence detect change and link layer change. No actual
state information.
It does latch that it has changed tho right ? So if presence detect
hasn't changed, we can assume it's an error and not an unplug ?
We could discriminate that way to reduce the risk of doing a recovery
without unbind on something that was actually removed and replaced.
I proposed this as one of the possible solutions but presence detect is
optional and also presence detect interrupt can be delivered after
link layer interrupt as well. No guarantees with respect to the order of
link layer interrupt and presence detect interrupts delivery.
I instead look at fatal error pending bit in device status register to
decide if the link down was initiated due to a pending fatal error or
somebody actually removed the card.
If fatal error is pending, wait until it is cleared. If link is healthy,
return gracefully.
Otherwise, proceed with the hotplug removal.
Cheers,
Ben.