Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24.04.2018 16:24, Alan Stern wrote:
On Tue, 24 Apr 2018, Mathias Nyman wrote:

On 23.04.2018 18:11, Alan Stern wrote:
On Mon, 23 Apr 2018, Mathias Nyman wrote:

On 22.04.2018 09:29, russianneuromancer@xxxxx wrote:
Hello!

So far I tested attached patch but didn't tried to revert commit yet,
will do next week.

Result of running patched kernel with recommended debug options:
https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg


Logs show there is a race, controller is suspended, then resumed,
but no interrupt is pending in xhci_resume so roothubs are not resumed,
and host starts to suspend again.

We get the interrupt only after we already started suspending xhci
controller again.

My guess is that when we handle the interrupt we queue work to resume the roothub,
but controller is probably put to D3 suspended state by then,
returning 0xffffffff from some register reads, which driver understands as a dead host.

I need to look into this a bit more.

In this situation, the HCD_WAKEUP_PENDING(hcd) test in
hcd-pci.c:suspend_common() should prevent the controller from going
back into D3.  The WAKEUP_PENDING bit gets set in
usb_hcd_resume_root_hub() and it doesn't get cleared until
hcd_bus_resume() runs.


I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
specific failing case

xhci_resume() has a check:
/* Resume root hubs only when have pending events. */
    status = readl(&xhci->op_regs->status);
      if (status & STS_EINT) {
        usb_hcd_resume_root_hub(xhci->shared_hcd);
        usb_hcd_resume_root_hub(hcd);
      }

If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
can suspend host controller again. when xhci driver finally gets to handle the interrupt
the controller may be in D3 already

This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
could be possible as xhci has interrupt moderation enabled.

Then maybe that test should be removed.  Calling
usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
because there probably are not very many times when the controller gets
resumed without the root hub also being resumed.


The check was added to fix system suspend issue on a runtime suspended host:

commit d6236f6d1d885aa19d1cd7317346fe795227a3cc

    xhci: Fix runtime suspended xhci from blocking system suspend.
The system suspend flow as following:
    1, Freeze all user processes and kenrel threads.
2, Try to suspend all devices. 2.1, If pci device is in RPM suspended state, then pci driver will try
    to resume it to RPM active state in the prepare stage.
2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
    workqueue items to resume usb2&usb3 roothub devices.
2.3, Call suspend callbacks of devices. 2.3.1, All suspend callbacks of all hcd's children, including
    roothub devices are called.
2.3.2, Finally, hcd_pci_suspend callback is called. Due to workqueue threads were already frozen in step 1, the workqueue
    items can't be scheduled, and the roothub devices can't be resumed in
    this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
    usb_hcd_resume_root_hub won't be cleared. Finally,
    hcd_pci_suspend will return -EBUSY, and system suspend fails.

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux