On Thu, 2018-11-08 at 23:06 +0000, Alex_Gagniuc@xxxxxxxxxxxx wrote: > On 11/08/2018 04:51 PM, Greg KH wrote: > > On Thu, Nov 08, 2018 at 10:49:08PM +0000, Alex_Gagniuc@xxxxxxxxxxxx wrote: > > > In the case that we're trying to fix, this code executing is a result of > > > the device being gone, so we can guarantee race-free operation. I agree > > > that there is a race, in the general case. As far as checking the result > > > for all F's, that's not an option when firmware crashes the system as a > > > result of the mmio read/write. It's never pretty when firmware gets > > > involved. > > > > If you have firmware that crashes the system when you try to read from a > > PCI device that was hot-removed, that is broken firmware and needs to be > > fixed. The kernel can not work around that as again, you will never win > > that race. > > But it's not the firmware that crashes. It's linux as a result of a > fatal error message from the firmware. And we can't fix that because FFS > handling requires that the system reboots [1]. Do we know the exact circumsances that result in firmware requesting a reboot? If it happen on any PCIe error I don't see what we can do to prevent that beyond masking UEs entirely (are we even allowed to do that on FFS systems?). > If we're going to say that we don't want to support FFS because it's a > separate code path, and different flow, that's fine. I am myself, not a > fan of FFS. But if we're going to continue supporting it, I think we'll > continue to have to resolve these sort of unintended consequences. > > Alex > > [1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources