Updated https://bugzilla.kernel.org/show_bug.cgi?id=79701 with the possible options to make forward progress. Thanks, Rajat On Fri, Aug 29, 2014 at 7:25 PM, Rajat Jain <rajatxjain@xxxxxxxxx> wrote: > Hello, > > On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: >> Hi Rajat, >> >> This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was >> bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch >> status while disabling"). Can you take a look? >> > > Hi, > > I looked at this and wanted to share by observations: > > The Basic Issue > ============== > There are a bunch of quick hotplug events (unplug followed by the > hot-plug) that are received by the hotplug driver. While both the > hotplug drivers (pciehp and acpiphp) are fine with it, the radeon > driver itself is probably not equipped enough to handle them so well? > [ 41.224428] trying to unbind memory from uninitialized GART ! > > When acpiphp was being used > ======================= > As Rafael mentions in this commit log, this is a problem with the VGA > subsystem, that requires the hot-plug driver to ignore such hot-plug > events associated with a slot that connects to such known Radeon > controllers. This was done for acpiphp by introducing a "no_hotplug" > flag for the ACPI: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736 > The above commit would fix the problem if the acpiphp is used, by > ignoring the hot-plug events for that slot. > > Switch to using pciehp > ================= > 1) For some reason, the system now seems to use pciehp for these slots > instead of the acpiphp (can someone please tell if this looks OK? I > only ask because I see the concerned Rafael's log getting printed that > seems to indicate that he is expecting the acpiphp to control this > slot?). But I also see that the pciehp has already grabbed the slot by > the time this messages gets printed: > [ 4.419180] VGA switcheroo: detected switching method > \_SB_.PCI0.VGA_.ATPX handle > > Even with using pciehp, things were still all right until the commit > 02e93a8a7, beacuse the pciehp used to ignore the hot-unplug events > (including loss-of-presence-detect and link-down) if (1) SURPRISE > removal is not supported or (2) ADAPTER is not present (which is what > this commit addresses). Thus the hot unplug event used to come, the > pciehp_disable_slot() used to find no adapter and refused to do > anything. > > Why problem started with pciehp > ========================== > Essentially the commits 02e93a8a7 and 2b3940b60 made the pciehp handle > all hot-unplug events (loss-of-presence-detect and link-downs) > irrespective of whether the the SURPRISE removal was supported or not, > and also if ADAPTER is not present. Now, I would think that both these > commits are still valid because it makes no sense to ignore an unplug > event (and let the kernel continue with stale data structures) just > because SURPRISE is not set, or the ADAPTER is not present (The latter > is an even better reason to process the unplug event). > > My recommendations / Options > ======================== > 1) I would first like an opinion on whether it is OK to see the pciehp > handle these hotplug slots. The radeon code seems to be ACPI > intensive, and Rafael's commit also seems to say that this was > supposed to be handled by acpiphp. > > 2) If it is expected to continue using pciehp, may be we could handle > it in the same way as Rafael did for acpiphp. We could add a flag in > the pci_dev ("ignore_hp_events" or something) and set it for the hot > pluggable slot from radeon code, just like acpi_bus_no_hotplug() is > called today. > > I'll be going out for vacation for the 3 days, and would be glad to > submit a patch if needed. > > One question to the gentleman who bisected this. (SpacemanSpiff). > Would it be possible for you to look for the following messages while > trying out the image just before the commit 02e93a8a7c? > ... > No adapter on slot(2) > ... > > Thanks & Best Regards, > > Rajat Jain -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html