On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote: > On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote: > > Does someone knows here on current state of the AWS instances using > the older Xen version which causes the issues? > > AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask > MSI-X vectors only on success") as it fixed a bug. Now several > downstream distros do carry a revert of this commit, which I believe > is an unfortunate situation and wonder if this can be addressed > upstream to deal with the AWS m4.large instance issues. The problem is that except for a bisect result we've not seen much information which might help to debug and analyze the problem. The guest uses MSI-X on that NIC: Capabilities: [70] MSI-X: Enable+ Count=3 Masked- So looking at the commit in question: diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 48e3f4e47b29..6748cf9d7d90 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, goto out_disable; } - /* Ensure that all table entries are masked. */ - msix_mask_all(base, tsize); - ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, /* Set MSI-X enabled bits and unmask the function */ pci_intx_for_msi(dev, 0); dev->msix_enabled = 1; + + /* + * Ensure that all table entries are masked to prevent + * stale entries from firing in a crash kernel. + * + * Done late to deal with a broken Marvell NVME device + * which takes the MSI-X mask bits into account even + * when MSI-X is disabled, which prevents MSI delivery. + */ + msix_mask_all(base, tsize); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); IOW, it moves the invocation of msix_mask_all() into the success path. As the device uses MSI-X this change does not make any difference from a hardware perspective simply because _all_ MSI-X interrupts are masked via the CTRL register already. And it does not matter whether the kernel masks them individually _before_ or _after_ the allocation. At least not on real hardware and on a sane emulation. Now this is XEN and neither real hardware nor sane emulation. It must be a XEN_HVM guest because XEN_PV guests disable PCI/MSI[-X] completely which makes the invocation of msix_mask_all() a NOP. If it's not a XEN_HVM guest, then you can stop reading further as I'm unable to decode why moving a NOP makes a difference. That belongs in to the realm of voodoo, but then XEN is voodoo at least for me. :) XEN guests do not use the common PCI mask/unmask machinery which would unmask the interrupt on request_irq(). So I assume that the following happens: Guest Hypervisor msix_capabilities_init() .... alloc_irq() xen_magic() -> alloc_msix_interrupt() request_irq() msix_mask_all() -> trap do_magic() request_irq() unmask() xen_magic() unmask_evtchn() -> do_more_magic() So I assume further that msix_mask_all() actually is able to mask the interrupts in the hardware (ctrl word of the vector table) despite the hypervisor having allocated and requested the interrupt already. Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like XEN_PV does. I can only assume the answer is voodoo... Maybe the XEN people have some more enlightened answers to that. Thanks, tglx