Hi Rajat, excellent summary. Another option might be to introduce a blacklist instead of a callback. Guenter ________________________________________ From: Rajat Jain <rajatxjain@xxxxxxxxx> Sent: Friday, August 29, 2014 7:25 PM To: Bjorn Helgaas Cc: linux-pci@xxxxxxxxxxxxxxx; Rafael J. Wysocki; Rajat Jain; Guenter Roeck Subject: Re: pciehp regression (bug #79701) Hello, On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: > Hi Rajat, > > This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was > bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch > status while disabling"). Can you take a look? > Hi, I looked at this and wanted to share by observations: The Basic Issue ============== There are a bunch of quick hotplug events (unplug followed by the hot-plug) that are received by the hotplug driver. While both the hotplug drivers (pciehp and acpiphp) are fine with it, the radeon driver itself is probably not equipped enough to handle them so well? [ 41.224428] trying to unbind memory from uninitialized GART ! When acpiphp was being used ======================= As Rafael mentions in this commit log, this is a problem with the VGA subsystem, that requires the hot-plug driver to ignore such hot-plug events associated with a slot that connects to such known Radeon controllers. This was done for acpiphp by introducing a "no_hotplug" flag for the ACPI: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736 The above commit would fix the problem if the acpiphp is used, by ignoring the hot-plug events for that slot. Switch to using pciehp ================= 1) For some reason, the system now seems to use pciehp for these slots instead of the acpiphp (can someone please tell if this looks OK? I only ask because I see the concerned Rafael's log getting printed that seems to indicate that he is expecting the acpiphp to control this slot?). But I also see that the pciehp has already grabbed the slot by the time this messages gets printed: [ 4.419180] VGA switcheroo: detected switching method \_SB_.PCI0.VGA_.ATPX handle Even with using pciehp, things were still all right until the commit 02e93a8a7, beacuse the pciehp used to ignore the hot-unplug events (including loss-of-presence-detect and link-down) if (1) SURPRISE removal is not supported or (2) ADAPTER is not present (which is what this commit addresses). Thus the hot unplug event used to come, the pciehp_disable_slot() used to find no adapter and refused to do anything. Why problem started with pciehp ========================== Essentially the commits 02e93a8a7 and 2b3940b60 made the pciehp handle all hot-unplug events (loss-of-presence-detect and link-downs) irrespective of whether the the SURPRISE removal was supported or not, and also if ADAPTER is not present. Now, I would think that both these commits are still valid because it makes no sense to ignore an unplug event (and let the kernel continue with stale data structures) just because SURPRISE is not set, or the ADAPTER is not present (The latter is an even better reason to process the unplug event). My recommendations / Options ======================== 1) I would first like an opinion on whether it is OK to see the pciehp handle these hotplug slots. The radeon code seems to be ACPI intensive, and Rafael's commit also seems to say that this was supposed to be handled by acpiphp. 2) If it is expected to continue using pciehp, may be we could handle it in the same way as Rafael did for acpiphp. We could add a flag in the pci_dev ("ignore_hp_events" or something) and set it for the hot pluggable slot from radeon code, just like acpi_bus_no_hotplug() is called today. I'll be going out for vacation for the 3 days, and would be glad to submit a patch if needed. One question to the gentleman who bisected this. (SpacemanSpiff). Would it be possible for you to look for the following messages while trying out the image just before the commit 02e93a8a7c? ... No adapter on slot(2) ... Thanks & Best Regards, Rajat Jain -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html