On Mon, Oct 29, 2018 at 04:06:51PM -0500, Bjorn Helgaas wrote: > [+cc Rafael, Len, Tony, Borislav, Tyler, Christoph, linux-acpi, LKML] > > On Fri, Oct 26, 2018 at 02:19:04PM -0600, Jon Derrick wrote: > > Add a bit in pci_host_bridge to indicate to leave the System Error > > Interrupts as configured by the pre-boot environment. Propagate this to > > the AER driver which disables System Error Interrupts. This commit message should not explain what the patch does - that's obvious - but why it is doing it. > > Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx> > > --- > > drivers/pci/pcie/aer.c | 7 +++++-- > > include/linux/pci.h | 3 +++ > > 2 files changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 83180ed..6a4af63 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1360,6 +1360,7 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev, > > static void aer_enable_rootport(struct aer_rpc *rpc) > > { > > struct pci_dev *pdev = rpc->rpd; > > + struct pci_host_bridge *host; > > int aer_pos; > > u16 reg16; > > u32 reg32; > > @@ -1369,8 +1370,10 @@ static void aer_enable_rootport(struct aer_rpc *rpc) > > pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16); > > > > /* Disable system error generation in response to error messages */ > > - pcie_capability_clear_word(pdev, PCI_EXP_RTCTL, > > - SYSTEM_ERROR_INTR_ON_MESG_MASK); > > + host = pci_find_host_bridge(pdev->bus); > > + if (!host->no_disable_sys_err) Double negation if (! .. ->no.. could simply be if (host->disable_sys_err... > > + pcie_capability_clear_word(pdev, PCI_EXP_RTCTL, > > + SYSTEM_ERROR_INTR_ON_MESG_MASK); > > If I squint hard enough this sort of makes sense, but it also makes me > confused about the normal APEI firmware-first model works. > > In the NON-firmare-first case, firmware isn't involved in handling AER > errors. The Linux AER driver fields an interrupt from a Root Port, > reads AER log registers, etc. > > In the normal APEI firmware-first case, when the hardware reports an > AER event, I think firmware gets control first, and *it* reads the AER > log registers, packages them up, and generates an interrupt to the OS, > which reads the packaged error state from the firmware via the HEST. > > If I understand this special Intel VMD firmware-first case correctly, > firmware gets control first, reads the AER log registers, and > synthesizes what looks to the OS like a normal AER interrupt. The Why? Why the faking? If firmware needs to get control, why doesn't it then *retain* control and report the error through HEST, like others do? AFAIUC, fw wants to do something underneath. What's wrong with making it a normal firmware-first case? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.