On Mon, Jan 15, 2018 at 09:43:22AM -0500, Sinan Kaya wrote: > On 12/19/2017 4:06 PM, Keith Busch wrote: > > @@ -289,6 +290,9 @@ static int dpc_probe(struct pcie_device *dev) > > int status; > > u16 ctl, cap; > > > > + if (pcie_aer_get_firmware_first(pdev)) > > + return -ENOTSUPP; > > + > > There are two ways to support firmware first handling along with DPC. > > The first one is to tie DPC handling to the firmware first enable. > > The second one is to enable DPC ERR_COR signalling so that firmware > gets notified on each DPC event occurrence. > > While the first one gives more control to the firmware, I think it beats > the purpose of the DPC. The first approach requires firmware to do some > "pre-processing" before notifying operating system of a failure. > > The goal of the DPC is to put hardware into safe state when a PCIe error > happens. The best software recovery following this is to notify endpoint > drivers of failures and shutdown threads/processes accessing the hardware > as quick as possible. > > The firmware-first event notification is through ACPI GHES and firmware injects > an artifical uncorrected AER error to the operating system. Once OS gets > notified, it tries to recover drivers through AER fatal error recovery mechanism. > > While the semantics of this path is clearly defined in ACPI, it is also known > to be slow as well. During the time firmware does its business, operating > system still could be trying to access the endpoint address space. > > My suggestion is to enable ERR_COR signalling so firmware gets a notification > on each DPC event for logging purposes. > > OS handles DPC natively and tries to recover hardware without any external > influence. I see what you're saying, but if a device has a firmware first policy, doesn't that mean firmware owns the DPC Control register? The OS shouldn't be mucking with it in that case, right?