On Thu, Jun 17, 2010 at 12:16 PM, Ben Hutchings <bhutchings@xxxxxxxxxxxxxx> wrote: > During suspend on an SMP system, {read,write}_msi_msg_desc() may be > called to mask and unmask interrupts on a device that is already in a > reduced power state. ÂAt this point memory-mapped registers including > MSI-X tables are not accessible, and config space may not be fully > functional either. > > While a device is in a reduced power state its interrupts are > effectively masked and its MSI(-X) state will be restored when it is > brought back to D0. ÂTherefore these functions can simply read and > write msi_desc::msg for devices not in D0. > > Further, read_msi_msg_desc() should only ever be used to update a > previously written message, so it can always read msi_desc::msg > and never needs to touch the hardware. > > Signed-off-by: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx> > --- > On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote: >> I'm debugging the bnx2 driver which doesn't work after suspend/resume if >> it is running in MSI-X mode. ÂThe problem is that during suspend, the >> MSI-X vectors are disabled by the following sequence on x86: >> >> take_cpu_down() -> cpu_disable_common() -> fixup_irqs() >> >> The MSI-X address/data used to disable the vectors are remembered in the >> above sequence. During resume, these address/data are then programmed >> back to the device during pci_restore_state(), causing all the vectors >> to remain disabled. > > That's not quite what I see. ÂWhat I see is that the message is read > back from the table *after* the driver's suspend method has been called. > At this point the device is already in D3 and memory-mapped registers > are not accessible, so we get random bits as the message. ÂAt least, > that's what I see happening with the sfc driver. > >> Some drivers call free_irq() during suspend and request_irq() during >> resume, and that should avoid the problem. Âbnx2 and some other drivers >> do not do that. ÂThese drivers rely on pci_restore_state() to restore >> the MSI-X vectors to the same working state before suspend. >> >> What's the right way to fix this? ÂThanks. > > This is my attempt, which works for sfc. ÂSee if it works for bnx2. > > Ben. > > Âdrivers/pci/msi.c | Â 34 +++++++++++----------------------- > Â1 files changed, 11 insertions(+), 23 deletions(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 77b68ea..03f04dc 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq) > Âvoid read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg) > Â{ > Â Â Â Âstruct msi_desc *entry = get_irq_desc_msi(desc); > - Â Â Â if (entry->msi_attrib.is_msix) { > - Â Â Â Â Â Â Â void __iomem *base = entry->mask_base + > - Â Â Â Â Â Â Â Â Â Â Â entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; > > - Â Â Â Â Â Â Â msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR); > - Â Â Â Â Â Â Â msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR); > - Â Â Â Â Â Â Â msg->data = readl(base + PCI_MSIX_ENTRY_DATA); > - Â Â Â } else { > - Â Â Â Â Â Â Â struct pci_dev *dev = entry->dev; > - Â Â Â Â Â Â Â int pos = entry->msi_attrib.pos; > - Â Â Â Â Â Â Â u16 data; > + Â Â Â /* We do not touch the hardware (which may not even be > + Â Â Â Â* accessible at the moment) but return the last message > + Â Â Â Â* written. ÂAssert that this is valid, assuming that > + Â Â Â Â* valid messages are not all-zeroes. */ > + Â Â Â BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo | > + Â Â Â Â Â Â Â Âentry->msg.data)); > > - Â Â Â Â Â Â Â pci_read_config_dword(dev, msi_lower_address_reg(pos), > - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &msg->address_lo); > - Â Â Â Â Â Â Â if (entry->msi_attrib.is_64) { > - Â Â Â Â Â Â Â Â Â Â Â pci_read_config_dword(dev, msi_upper_address_reg(pos), > - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &msg->address_hi); > - Â Â Â Â Â Â Â Â Â Â Â pci_read_config_word(dev, msi_data_reg(pos, 1), &data); > - Â Â Â Â Â Â Â } else { > - Â Â Â Â Â Â Â Â Â Â Â msg->address_hi = 0; > - Â Â Â Â Â Â Â Â Â Â Â pci_read_config_word(dev, msi_data_reg(pos, 0), &data); > - Â Â Â Â Â Â Â } > - Â Â Â Â Â Â Â msg->data = data; > - Â Â Â } > + Â Â Â *msg = entry->msg; > Â} > > Âvoid read_msi_msg(unsigned int irq, struct msi_msg *msg) > @@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg) > Âvoid write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg) > Â{ > Â Â Â Âstruct msi_desc *entry = get_irq_desc_msi(desc); > - Â Â Â if (entry->msi_attrib.is_msix) { > + > + Â Â Â if (entry->dev->current_state != PCI_D0) { This check exposed a problem in ixgb (patch is on the way) where pci_disable_device() was not being called in ixgb_remove(). As a result the current_state was set to PCI_UNKNOWN and the interface failed to work on subsequent load of the driver. Even though the problem was in ixgb, it made me wonder about this check as the presumption here (low power state) may not always be true. Like in the case of unloading a driver, which sets dev->current_state to PCI_UNKNOWN which is not a representation of the _real_ state of the device (actual state could be D0). BTW - quick search shows other drivers that could potentially suffer the faith of ixgb due to lack of pci_disable_device() call on removal. Thanks, Emil -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html