Re: [PATCH v3] usb: warm-reset ports on hub resume, if requested

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> · Thu, 31 Jan 2019 16:56:50 +0200




On 31.01.2019 12:42, Jan-Marek Glogowski wrote:
Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
Am 30.01.19 um 15:58 schrieb Mathias Nyman:
On 30.01.2019 14:38, Jan-Marek Glogowski wrote:

The hub->event bits are set when roothub polling timer function gets data from
hub_status_data(), writes the data to a status urb, and returns the urb.

Normal operation:
usb_hcd_poll_rh_status()        // roothub polling timer function
   hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
     xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
   if (buffer && urb)
     memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
     usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
       hub_irq()
         hub->event_bits = urb->transfer_buffer //
   mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer

I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.

This could be possible if roothub is polled before properly resumed, then the URB wouldn't
be available yet, and no hub->event_bits are set.
Because no bits are set the hub starts suspending again after resume, probably before roothub
timer polls roothub again.
suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
it start resuming again, and continues this in a loop.

Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
into the future because as a part of suspend attempt/resume loop.

So as you expected the status_urb is always NULL.

I tried to understand more of the code by adding some more debug message.

I looked at usb_hcd_resume_root_hub, which is called when we see the "resume root hub" dmesg.
This just queues the hcd->wakeup_work, so indeed, if usb_hcd_poll_rh_status is later called,
the hub might not have resumed yet and polling starts. So my simple approach was to add a
function to call flush_work(&hcd->wakeup_work) before usb_hcd_poll_rh_status. All variants I
tested killed the kernel. I also tried to add a udelay to no avail.

I think the idea is that if usb_hcd_poll_rh_status() finds a port that needs attention, but
the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set.

This flag is checked in resume when submitting the hub URB, and idea is to react on it:

hub_activate(HUB_RESUME)
    usb_submit_urb(hub->urb)
     rh_urb_enqueue()
       rh_queue_status()
          hcd->status_urb = urb;
          if (HCD_POLL_PENDING(hcd)
             mod_timer(&hcd->rh_timer, jiffies);

It tries to trigger usb_hcd_poll_rh_status() immediately by modifying the timer,
but from your logs it seems that it's not fast enough:

[  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
 ...
[  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
...
[  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
[  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
[  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
[  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x18d, status_urb NULL

hcd->flags 0x18d means HCD_FLAG_POLL_PENDING  was set, but hub_event() workqueue and hub_suspend()
are run before polling the root hub, and I guess the URB gets removed at some point during suspsend.

This then goes on in a loop.

You could try running usb_hcd_poll_rh_status() direcly instead of just kicking the timer and see if
it works:

@@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
                mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) * (HZ/4));
 
        /* If a status change has already occurred, report it ASAP */
-       else if (HCD_POLL_PENDING(hcd))
-               mod_timer(&hcd->rh_timer, jiffies);
+       else if (HCD_POLL_PENDING(hcd)) {
+               spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
+               usb_hcd_poll_rh_status(hcd);
+               return 0;
+       }
        retval = 0;
  done:
        spin_unlock_irqrestore (&hcd_root_hub_lock, flags);


I didn't check locking or running contexts if it's ok to do this, there's could be a reason
why it wasn't done like this in the first place.
Worth a shot still.

-Mathias