Re: pciehp is broken from 4.10-rc1

Lukas Wunner <lukas@xxxxxxxxx> · Sat, 4 Feb 2017 09:12:54 +0100

On Fri, Feb 03, 2017 at 11:00:19PM -0800, Yinghai Lu wrote:
> On Thu, Feb 2, 2017 at 9:52 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > Could you check if the port above 0000:60:03.2 is runtime suspended
> > when you're trying this, i.e. does its power/runtime_status entry in
> > sysfs say "suspended"?
> 
> yes.

Huh?  That shouldn't happen, the port 0000:60:03.2 should block its
parents from runtime suspending (by way of checking is_hotplug_bridge
in pci_dev_check_d3cold()).

Maybe there's a misunderstanding here.  I was referring to the port
*above* 0000:60:03.2.  I don't know it's device name, it's not apparent
from the logs you've posted so far.

I've opened a bugzilla entry for this:
https://bugzilla.kernel.org/show_bug.cgi?id=193951

Please attach the output of "lspci -vvvvxxxx" and full dmesg output.

> > If you add pm_runtime_get_sync(&ctrl->pcie->port->dev) in
> > drivers/pci/pciehp_ctrl.c:pciehp_enable_slot() before the call to
> > pciehp_get_power_status(), and a corresponding pm_runtime_put()
> > afterwards, does the issue go away?
> 
> Still not working.
> 
> the problem is
> sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power
> [  141.838027] mlx4_core 0000:65:00.0: PME# disabled
> [  143.279434] iommu: Removing device 0000:65:00.0 from group 172
> [  143.292329] pcieport 0000:60:03.2: PME# enabled
> [  143.297431] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command
> 0x11f1 (issued 81476 msec ago)
> [  143.337545] pcieport 0000:60:03.2: PME# disabled
> [  143.380359] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down
> [  143.386735] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event
> ignored; already powering off
> [  143.445483] pcieport 0000:60:03.2: PME# enabled
> [  143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up
> [  143.999004] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event
> queued; currently getting powered off
> [  144.025590] pcieport 0000:60:03.2: PME# disabled
> [  144.133548] pcieport 0000:60:03.2: PME# enabled
> [  144.333603] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled
> sca05-0a81e0db:~ # [  144.357483] pcieport 0000:60:03.2: PME# disabled
> [  144.465566] pcieport 0000:60:03.2: PME# enabled
> 
> we have extra Link Up event queued, while pm_runtime_get_sync/pm_runtime_put ?
>   [  143.445483] pcieport 0000:60:03.2: PME# enabled
>   [  143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up

I notice that with 68db9bc81436 applied, PME is repeatedly enabled and
disabled on the port, presumably whenever it switches from D3 to D0
and vice-versa.

Perhaps this port sends an interrupt while PME is enabled and the slot
is actually occupied, despite it having been disabled via sysfs.
That's a case I couldn't test when developing the patch for lack of
PME capable hardware.

If you comment out the calls to __pci_enable_wake() in
drivers/pci/pci.c:pci_finish_runtime_suspend(), does the issue go away?

Thanks,

Lukas