On Mon, Jul 7, 2014 at 2:26 PM, Myron Stowe <myron.stowe@xxxxxxxxx> wrote: > On Tue, Jul 1, 2014 at 1:29 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: >> On Mon, Jun 30, 2014 at 10:49 AM, Myron Stowe <myron.stowe@xxxxxxxxx> wrote: >>> On Tue, Jun 17, 2014 at 3:07 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: >>>> On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe <myron.stowe@xxxxxxxxxx> wrote: >>>>> During PCIe hot-plug initialization - pciehp_probe - data structures >>>>> related to slot capabilities are set up. As part of this set up, ISRs are >>>>> put in place to handle slot events and all event bits are cleared out. >>>>> >>>>> This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC) >>>>> Slot Status bit to the event bits that are cleared out during >>>>> initialization. >>>> >>>> I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change >>>> notifications for hot-plug and removal"). Prior to that, pcie_isr() >>>> didn't look at the PCI_EXP_SLTSTA_DLLSC bit. >>>> >>>> Apparently there's a non-public report of spurious messages like this >>>> at boot-time, with no actual hotplug events: >>>> >>>> pciehp 0000:82:04.0:pcie24: slot(4): Link Up event >>>> pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at >>>> 0000:83:00, cannot hot-add >>>> pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00 >>>> >>>> Device 0000:83:00.0 was enumerated previously, during the normal PCI >>>> device enumeration. I suspect DLLSC was set by the hardware when the >>>> link came up after power-on, and it remained set until Linux booted. >>>> Then when we take the first pciehp interrupt, we notice DLLSC is set >>>> and think it's a new link state change. >>>> >>>> This might be system-specific, because some BIOSes might clear DLLSC >>>> before handing off to the OS. Or my theory might be all wet. >>>> >>>> But I think we should clear DLLSC in any case. We clear all the other >>>> RW1C bits in Slot Status, and I think we should clear DLLSC as well. >>>> >>>> I'd like to include a bugzilla or mailing list reference for the >>>> spurious message, and I'd also like confirmation that this changes >>>> actually fixes it. >>> >>> I just got confirmation from the reporter that the patch fixes the >>> issue they were encountering. >>> >>> I've also asked them numerous time if I could make the bug public as I >>> expect others may be hitting it and could benefit from a BZ and >>> analysis/explanation. I just repeated my request - if they reply >>> positively I'll follow through with some documentation. >> >> I'd be glad to merge this, as soon as we get more details about the >> problem it fixes. I think it's good to include some specifics, e.g., >> "Unexpected Link Up event," etc., because it helps other people with >> the same problem to find the solution. It's OK if details about >> pre-production machines and so on are removed, but it's better if >> changes to generic code are supported by public evidence that >> everybody can evaluate. >> >> If you want to pull out the public stuff into a kernel.org bugzilla, >> that would be fine, too. That would actually be better because then >> we don't have to depend on Red Hat maintaining it. > > Agreed (but I couldn't get the vendor to agree to open up the Red Hat > BZ). I've opened https://bugzilla.kernel.org/show_bug.cgi?id=79611 > and included a scrubbed 'dmesg' log there along with an analysis of > how we are hitting the issue. Beautiful, thanks! I marked this for stable and put this in pci/hotplug for v3.17. >>>>> Reference: >>>>> PCI-SIG. PCI Express Base Specification Revision 4.0 Version 0.3 >>>>> (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah). >>>>> >>>>> Signed-off-by: Myron Stowe <myron.stowe@xxxxxxxxxx> >>>>> --- >>>>> >>>>> drivers/pci/hotplug/pciehp_hpc.c | 2 +- >>>>> 1 files changed, 1 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c >>>>> index 42914e0..0568416 100644 >>>>> --- a/drivers/pci/hotplug/pciehp_hpc.c >>>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c >>>>> @@ -794,7 +794,7 @@ struct controller *pcie_init(struct pcie_device *dev) >>>>> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, >>>>> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD | >>>>> PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC | >>>>> - PCI_EXP_SLTSTA_CC); >>>>> + PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC); >>>>> >>>>> /* Disable software notification */ >>>>> pcie_disable_notification(ctrl); >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html