On Thu, 2018-08-16 at 17:05 +1000, Benjamin Herrenschmidt wrote: > Those changes are utterly broken. > > The basic premise of the design that we woudl do that unplug/replug > trick if and ONLY IF the driver doesn't have the appropriate callbacks. > > We are also now looking at replacing this with an ubind/re-bind because > in practice, the unplugging is causing us all sort of problems. Sam > (CC) can elaborate. > > Bjorn, we are the main authors of that spec (Linas wrote it under my > supervision) and created those callbacks for EEH. AER picked them up > only later. Those changes must be at the very least acked by us before > going upstream. Also I had a quick look at the DPC spec, it looks like a subset of our EEH capability. Again, the code in Linux was written without concertation with us, misunderstands/misuses the driver callbacks, does unplugs when it shouldn't etc... It's all very broken. Please, at the very least revert the spec changes. They are utterly wrong. The driver MUST remain active during the recovery process *including* fatal errors. Only if the recovery fails and the driver gives us may you chose to unplug the device (though there is little point). What you have designed will work fine for network drivers but will not work for storage. Ben.