On Wed, Oct 24, 2018 at 10:47:24AM +0300, Meelis Roos wrote: > > Would you mind opening a report at https://bugzilla.kernel.org? I'm > > not sure if anybody will be able to do anything about this, but it's > > always possible. > > Submitted now, https://bugzilla.kernel.org/show_bug.cgi?id=201503 > > > A complete dmesg log and "sudo lspci -vv" output from a successful > > boot would be a good start. And if you have a screenshot of the > > failure, that would help, too. You can use the "ignore_loglevel" > > kernel parameter to make sure we see everything on the console. > > Added. > > > Does this machine have an iLO? If so, it may have logs that > > could be useful if this is related to some sort of bus error. > > Nothing in the ILO logs. Great, thanks! Can you try the patch below? This is extracted from the code here: https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805 I'm not sure why this would be only an intermittent problem, but at least we can see if this is related. diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6bc27b7fd452..842f900ed194 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5113,3 +5113,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575, quirk_switchtec_ntb_dma_alias); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576, quirk_switchtec_ntb_dma_alias); + +static void quirk_amd_8111(struct pci_dev *pdev) +{ + u8 ioc; + + pci_read_config_byte(pdev, 0x40, &ioc); + if (ioc & 0x80) { + pci_info(pdev, "disabling NMI on error\n"); + pci_write_config_byte(pdev, 0x40, ioc & ~0x80); + } +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7468, quirk_amd_8111);