On Mon, Mar 12, 2018 at 08:16:38PM +0530, poza@xxxxxxxxxxxxxx wrote: > On 2018-03-12 19:55, Keith Busch wrote: > > On Sun, Mar 11, 2018 at 11:03:58PM -0400, Sinan Kaya wrote: > > > On 3/11/2018 6:03 PM, Bjorn Helgaas wrote: > > > > On Wed, Feb 28, 2018 at 10:34:11PM +0530, Oza Pawandeep wrote: > > > > > > > That difference has been there since the beginning of DPC, so it has > > > > nothing to do with *this* series EXCEPT for the fact that it really > > > > complicates the logic you're adding to reset_link() and > > > > broadcast_error_message(). > > > > > > > > We ought to be able to simplify that somehow because the only real > > > > difference between AER and DPC should be that DPC automatically > > > > disables the link and AER does it in software. > > > > > > I agree this should be possible. Code execution path should be almost > > > identical to fatal error case. > > > > > > Is there any reason why you went to stop driver path, Keith? > > > > The fact is the link is truly down during a DPC event. When the link > > is enabled again, you don't know at that point if the device(s) on the > > other side have changed. Calling a driver's error handler for the wrong > > device in an unknown state may have undefined results. Enumerating the > > slot from scratch should be safe, and will assign resources, tune bus > > settings, and bind to the matching driver. > > > > Per spec, DPC is the recommended way for handling surprise removal > > events and even recommends DPC capable slots *not* set 'Surprise' > > in Slot Capabilities so that removals are always handled by DPC. This > > service driver was developed with that use in mind. > > Now it begs the question, that > > after DPC trigger > > should we enumerate the devices, ? > or > error handling callbacks, followed by stop devices followed by enumeration ? > or > error handling callbacks, followed by enumeration ? (no stop devices) I'm not sure I understand. The link is disabled while DPC is triggered, so if anything, you'd want to un-enumerate everything below the contained port (that's what it does today). After releasing a slot from DPC, the link is allowed to retrain. If there is a working device on the other side, a link up event occurs. That event is handled by the pciehp driver, and that schedules enumeration no matter what you do to the DPC driver.