On 7/29/18, Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Sat, Jul 28, 2018 at 05:26:57PM -0700, Sinan Kaya wrote: >> On 7/28/2018 11:31 AM, Lukas Wunner wrote: >> >The knowledge whether a surprise removal or a safe removal is at hand >> >does exist further up in the call stack: A surprise removal is >> >initiated by pciehp_handle_presence_or_link_change(), a safe removal by >> >pciehp_handle_disable_request(). >> >> Can you also check if platform supports surprise link down error >> reporting (Link Capabilities Register) and reports a surprise link >> down event in AER Uncorrectable Error Status Register for the >> hotplug code to make it more reliable? > > We read the Link Capabilities register in pcie_init() to determine if > Data Link Layer Link Active Reporting is supported. (That's a feature > added in the PCIe r1.1 Base Spec. Old devices that strictly adhere to > PCIe r1.0 don't support it.) > > We could likewise cache the Surprise Down Error Reporting Capable bit > in struct controller. But I don't quite understand yet how and when > you want it to be used by pciehp? If the link goes down, pciehp doesn't > care whether that's caused by a fatal error or removal by the user. > It seems correct to me to also remove devices on a fatal error, after all > they're no longer accessible until the error is cleared (IIUC). > Do you agree or disagree? Yes, we have to remove the devices for both. However, I don't think pciehp is the right place for fatal error link events. I am trying to separate these two execution paths. If we fail, someone will have to take the challenge and unify link code for both. Right now, two threads are trying to do the same thing in parallel. Why surprise link down check... Data link layer can change due to signal integrity issues. Spec defines surprise link down bit to separate device removal >From other link quality issues. That's why, i was suggesting to check it if a pcie device supports It. > > Thanks, > > Lukas >