On Mon, Aug 20, 2018 at 05:21:30PM -0400, Sinan Kaya wrote: > On 8/20/2018 5:05 PM, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-20 at 09:53 -0600, Keith Busch wrote: > > > On Mon, Aug 20, 2018 at 09:22:27PM +1000, Benjamin Herrenschmidt wrote: > > > > The main problem with unplug/replug (as I mentioned earlier) is that it > > > > just does NOT work for storage controllers (or similar type of > > > > devices). The links between the storage controller and the mounted > > > > filesystems is lost permanently, you'll most likely have to reboot the > > > > machine. > > > > > > You probably shouldn't mount raw storage devices if they can be hot > > > added/removed. There are device mappers for that! :) > > > > This is not about hot adding/removing, it's about error recovery. > > > > > And you can't just change DPC device removal. A DPC event triggers > > > the link down, and that will trigger pciehp to disconnect the subtree > > > anyway. Having DPC do it too just means you get the same behavior with > > > or without enabling STLCTL.DLLSC. > > > > This is wrong. EEH can trigger a link down to and we don't remove the > > subtree in that case. We allow the drivers to recover. > > > > I have a patch to solve this issue. > > https://lkml.org/lkml/2018/8/19/124 > > Hotplug driver removes the devices on link down events and re-enumerates > on insertion. > > I am trying to separate fatal error handling from hotplug. I'll try to take a look. We can't always count on pciehp to do the removal when a removal occurs, though. The PCIe specification contains an implementation note that DPC may be used in place of hotplug surprise.