On Thu, Oct 20, 2011 at 04:39:59PM -0700, Eric W. Biederman wrote: > Bjorn Helgaas <bhelgaas@xxxxxxxxxx> writes: > > > On Wed, Oct 19, 2011 at 5:02 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: > >> On Mon, Oct 17, 2011 at 12:46 PM, Eric W. Biederman > >> <ebiederm@xxxxxxxxxxxx> wrote: > >>> > >>> I traced a nasty kexec on panic boot failure to the fact that we had > >>> screaming msi interrupts and we were not disabling the msi messages at > >>> kernel startup. The booting kernel had not enabled those interupts so > >>> was not prepared to handle them. > >>> > >>> I can see no reason why we would ever want to leave the msi interrupts > >>> enabled at boot if something else has enabled those interrupts. The pci > >>> spec specifies that msi interrupts should be off by default. Drivers > >>> are expected to enable the msi interrupts if they want to use them. Our > >>> interrupt handling code reprograms the interrupt handlers at boot and > >>> will not be be able to do anything useful with an unexpected interrupt. > >>> > >>> This patch applies cleanly all of the way back to 2.6.32 where I noticed > >>> the problem. > >>> > >>> Cc: stable@xxxxxxxxxx > >>> > >>> Signed-off-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > >>> --- > >>> drivers/pci/msi.c | 10 ++++++++++ > >>> 1 files changed, 10 insertions(+), 0 deletions(-) > >>> > >>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > >>> index 2f10328..e174982 100644 > >>> --- a/drivers/pci/msi.c > >>> +++ b/drivers/pci/msi.c > >>> @@ -869,5 +869,15 @@ EXPORT_SYMBOL(pci_msi_enabled); > >>> > >>> void pci_msi_init_pci_dev(struct pci_dev *dev) > >>> { > >>> + int pos; > >>> INIT_LIST_HEAD(&dev->msi_list); > >>> + > >>> + /* Disable the msi hardware to avoid screaming interrupts > >>> + * during boot. This is the power on reset default so > >>> + * usually this should be a noop. > >>> + */ > >>> + pos = pci_find_capability(dev, PCI_CAP_ID_MSI); > >>> + if (pos) > >>> + msi_set_enable(dev, pos, 0); > >>> + msix_set_enable(dev, 0); > >>> } > >> > >> We only build msi.o if CONFIG_PCI_MSI=y, but don't you want to disable > >> these even if the new kernel has CONFIG_PCI_MSI=n? > > > > Since problems like this are so nasty to debug, maybe it'd be worth a > > note in dmesg if we found MSI enabled? I guess I don't know exactly > > what we'd do with the information, but sometimes we can't predict in > > advance what things are useful. > > It was nasty because it was hard to get an interrupt to scream, and then > it was just plain weird as it was an oversight in our logic. > > Once I enabled debugging output it was very clear I had a screaming > interrupt. > > A warn once might not be a bad idea. I would hate to print this for > every msi-x capable pci device in the system as we boot up. Still > once we have my patch applied the situation will be handled so I don't > know of any reason why we would care if msi's were enabled at boot up. printk_ratelimited? > > And if there is an irq screaming we do have pretty good messages. This > one was just a little extra tricky because the kernel wasn't expecting > these interrupts. Meaning that the "irq XX: nobody cared" was not printing b/c there was an IRQ handler for this particular interrupt ? -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html