Am 31. Januar 2019 17:53:13 MEZ schrieb Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>: >This thread has been hard to follow. I'll try to answer some of your >concerns below... > >On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote: > >> Am 31.01.19 um 15:56 schrieb Mathias Nyman: >> > On 31.01.2019 12:42, Jan-Marek Glogowski wrote: >> >> Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski: >> >>> Am 30.01.19 um 15:58 schrieb Mathias Nyman: >> >>>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote: >> >>>> >> >>>> The hub->event bits are set when roothub polling timer function >gets data from >> >>>> hub_status_data(), writes the data to a status urb, and returns >the urb. > >You have been concentrating on what happens with root hubs >(understandable, since that's where your problem occurs). But in order > >to understand the code, you have to realize that it is meant to apply >to _all_ hubs, both root and external. > >> >>>> Normal operation: >> >>>> usb_hcd_poll_rh_status() // roothub polling timer >function >> >>>> hcd->driver->hub_status_data(hcd, buffer) -> >xhci_hub_status_data() >> >>>> xhci_hub_status_data() //returns buffer with bits set >if a port needs attention >> >>>> if (buffer && urb) >> >>>> memcpy(urb->transfer_buffer, buffer) // copy buffer >bits to urb >> >>>> usb_hcd_giveback_urb(urb) -> hub_irq // calls urb >complete callback function >> >>>> hub_irq() >> >>>> hub->event_bits = urb->transfer_buffer // >> >>>> mod_timer(usb_hcd_poll_rh_status, future) // reschedule >roothub polling timer >> >>>> >> >>>> I'm guessing usb_hcd_poll_rh_status() never properly completes a >URB in your case. > >It's not just a question of a timer expiring. If we were dealing with >an external hub, the code would have to wait until an actual USB >transfer had completed. There's no way to avoid that wait. > >> >>>> This could be possible if roothub is polled before properly >resumed, then the URB wouldn't >> >>>> be available yet, and no hub->event_bits are set. >> >>>> Because no bits are set the hub starts suspending again after >resume, probably before roothub >> >>>> timer polls roothub again. >> >>>> suspend fails when it manually checks >hcd->driver->hub_status_data() for activity, and >> >>>> it start resuming again, and continues this in a loop. >> >>>> >> >>>> Either roothub polling is stopped at the suspend attempt, or >maybe continuously rescheduled >> >>>> into the future because as a part of suspend attempt/resume >loop. >> >> >> >> So as you expected the status_urb is always NULL. >> >> >> >> I tried to understand more of the code by adding some more debug >message. >> >> >> >> I looked at usb_hcd_resume_root_hub, which is called when we see >the "resume root hub" dmesg. >> >> This just queues the hcd->wakeup_work, so indeed, if >usb_hcd_poll_rh_status is later called, >> >> the hub might not have resumed yet and polling starts. So my >simple approach was to add a >> >> function to call flush_work(&hcd->wakeup_work) before >usb_hcd_poll_rh_status. All variants I >> >> tested killed the kernel. I also tried to add a udelay to no >avail. >> > >> > I think the idea is that if usb_hcd_poll_rh_status() finds a port >that needs attention, but >> > the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set. >> > >> > This flag is checked in resume when submitting the hub URB, and >idea is to react on it: >> > >> > hub_activate(HUB_RESUME) >> > usb_submit_urb(hub->urb) >> > rh_urb_enqueue() >> > rh_queue_status() >> > hcd->status_urb = urb; >> > if (HCD_POLL_PENDING(hcd) >> > mod_timer(&hcd->rh_timer, jiffies); >> > >> > It tries to trigger usb_hcd_poll_rh_status() immediately by >modifying the timer, >> > but from your logs it seems that it's not fast enough: >> > >> > [ 485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume >> > [ 485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: >get port status, actual port 0 status = 0x2a0 >> > ... >> > [ 485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg >0000 evt 0000 >> > [ 485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: >set port remote wake mask, actual port 0 status = 0xe0002a0 >> > ... >> > [ 485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend >> > [ 485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, >wakeup 1 >> > [ 485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced >with wakeup event >> > [ 485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume >> > [ 485.309765] usb usb2: usb_hcd_poll_rh_status, length 1, >hcd->flags 0x18d, status_urb NULL >> > >> > hcd->flags 0x18d means HCD_FLAG_POLL_PENDING was set, but >hub_event() workqueue and hub_suspend() >> > are run before polling the root hub, and I guess the URB gets >removed at some point during suspsend. >> > >> > This then goes on in a loop. >> > >> > You could try running usb_hcd_poll_rh_status() direcly instead of >just kicking the timer and see if >> > it works: >> > >> > @@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd >*hcd, struct urb *urb) >> > mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) * >(HZ/4)); >> > >> > /* If a status change has already occurred, report it ASAP >*/ >> > - else if (HCD_POLL_PENDING(hcd)) >> > - mod_timer(&hcd->rh_timer, jiffies); >> > + else if (HCD_POLL_PENDING(hcd)) { >> > + spin_unlock_irqrestore (&hcd_root_hub_lock, flags); >> > + usb_hcd_poll_rh_status(hcd); >> > + return 0; >> > + } >> > retval = 0; >> > done: >> > spin_unlock_irqrestore (&hcd_root_hub_lock, flags); >> > >> > >> > I didn't check locking or running contexts if it's ok to do this, >there's could be a reason >> > why it wasn't done like this in the first place. >> > Worth a shot still. > >It's not a good idea. Deferring the operation to a timer keeps things >decoupled (running in different contexts with different stacks); doing >a direct call could get into a loop of nested calls that would keep >going deeper and deeper until it exhausted the kernel stack. > >Perhaps you didn't notice that at the end, hub_activate() calls >kick_hub_wq(). That routine calls >usb_autopm_get_interface_no_resume(), which will prevent the hub trim >suspending until hub_event() calls usb_autopm_put_interface(). >Therefore to make things work correctly, hub_activate() has to ensure >that the hub->event_bits and hub->change_bits fields are set correctly, >so that hub_event() will do the right thing when it runs (before the >hub is allowed to go back into suspend). > >Therefore if a port is in a funny state, hub_activate() should detect >this and set one of these bit fields appropriately. That's one reason >why it contains all those tests for portstatus and portchange. If a >missing test needs to be added, that's what you should do. That would be like the v3 of my patch, which started this thread and detected and handled the warm-reset request of the port in hub_activate() on resume. Maybe we can set some of these bits instead, or its already handled by the hub_port_reset() call; have to check. I also included a patch in some previous mail after Mathias Nyman commented about the NULL status_urb, which added usb_hcd_poll_rh_status directly after the root hubs usb_remote_wakeup call. A lot of stuff is deferred via workqueue or polled. As I understand it - in my case - resume, suspend and status race and miss each other in some endless loop. As things are, I just have the new desktop HWs and a single usb-c device to test the ports of them. I have no idea what would happen with an additional non-root hub, but I can probably organize one, if I have to test it. Jan-Marek