Re: pciehp regression (bug #79701)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Updated

https://bugzilla.kernel.org/show_bug.cgi?id=79701

with the possible options to make forward progress.

Thanks,

Rajat


On Fri, Aug 29, 2014 at 7:25 PM, Rajat Jain <rajatxjain@xxxxxxxxx> wrote:
> Hello,
>
> On Fri, Aug 29, 2014 at 3:36 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> Hi Rajat,
>>
>> This bug report: https://bugzilla.kernel.org/show_bug.cgi?id=79701 was
>> bisected to 02e93a8a7c1d ("PCI: pciehp: Don't check adapter or latch
>> status while disabling").  Can you take a look?
>>
>
> Hi,
>
> I looked at this and wanted to share by observations:
>
> The Basic Issue
> ==============
> There are a bunch of quick hotplug events (unplug followed by the
> hot-plug) that are received by the hotplug driver. While both the
> hotplug drivers (pciehp and acpiphp) are fine with it, the radeon
> driver itself is probably not equipped enough to handle them so well?
> [   41.224428] trying to unbind memory from uninitialized GART !
>
> When acpiphp was being used
> =======================
> As Rafael mentions in this commit log, this is a problem with the VGA
> subsystem, that requires the hot-plug driver to ignore such hot-plug
> events associated with a slot that connects to such known Radeon
> controllers. This was done for acpiphp by introducing a "no_hotplug"
> flag for the ACPI:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736
> The above commit would fix the problem if the acpiphp is used, by
> ignoring the hot-plug events for that slot.
>
> Switch to using pciehp
> =================
> 1) For some reason, the system now seems to use pciehp for these slots
> instead of the acpiphp (can someone please tell if this looks OK? I
> only ask because I see the concerned Rafael's log getting printed that
> seems to indicate that he is expecting the acpiphp to control this
> slot?). But I also see that the pciehp has already grabbed the slot by
> the time this messages gets printed:
> [    4.419180] VGA switcheroo: detected switching method
> \_SB_.PCI0.VGA_.ATPX handle
>
> Even with using pciehp, things were still all right until the commit
> 02e93a8a7, beacuse the pciehp used to ignore the hot-unplug events
> (including loss-of-presence-detect and link-down) if (1) SURPRISE
> removal is not supported or (2) ADAPTER is not present (which is what
> this commit addresses). Thus the hot unplug event used to come, the
> pciehp_disable_slot() used to find no adapter and refused to do
> anything.
>
> Why problem started with pciehp
> ==========================
> Essentially the commits 02e93a8a7 and 2b3940b60 made the pciehp handle
> all hot-unplug events (loss-of-presence-detect and link-downs)
> irrespective of whether the the SURPRISE removal was supported or not,
> and also if ADAPTER is not present. Now, I would think that both these
> commits are still valid because it makes no sense to ignore an unplug
> event (and let the kernel continue with stale data structures) just
> because SURPRISE is not set, or the ADAPTER is not present (The latter
> is an even better reason to process the unplug event).
>
> My recommendations / Options
> ========================
> 1) I would first like an opinion on whether it is OK to see the pciehp
> handle these hotplug slots. The radeon code seems to be ACPI
> intensive, and Rafael's commit also seems to say that this was
> supposed to be handled by acpiphp.
>
> 2) If it is expected to continue using pciehp, may be we could handle
> it in the same way as Rafael did for acpiphp. We could add a flag in
> the pci_dev ("ignore_hp_events" or something) and set it for the hot
> pluggable slot from radeon code, just like acpi_bus_no_hotplug() is
> called today.
>
> I'll be going out for vacation for the 3 days, and would be glad to
> submit a patch if needed.
>
> One question to the gentleman who bisected this. (SpacemanSpiff).
> Would it be possible for you to look for the following messages while
> trying out the image just before the commit 02e93a8a7c?
> ...
> No adapter on slot(2)
> ...
>
> Thanks & Best Regards,
>
> Rajat Jain
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux