On 1.3.2022 12.28, Henry Lin wrote: >>> USB2 resume starts with usb_hcd_start_port_resume() in port status >>> change handling for RESUME link state. usb_hcd_end_port_resume() call is >>> needed to keep runtime PM balance. > >> For normal usb2 port resume the usb_hcd_end_port_resume() is called when resume >> has been signaled for long enough in xhci_handle_usb2_port_link_resume(). >> >> This is also where driver directs the port to go from Resume state to U0. >> Port can't do this without driver directing it. >> >> If there's a failure during resume signaling (disconnect, reset, error) then >> stale resume variables are detected in xhci_get_port_status() and >> usb_hcd_end_port_resume() is called. > >> I do now see a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status") >> does change order of checking and clearing stale resume variables, but this should >> only happen if the first port state we read is a fully enabled functional U0 state after >> a failed resume. > >> Could you expand a bit how this was detected? > We observed the racing issue when usb2 device-initiated resume occurs in system resume. > If usb2 host-initiated resume for system resume directs U0 before xhci_get_usb2_port_status() > see RESUME state, xhci_get_usb2_port_status() will not finish resume process in > xhci_handle_usb2_port_link_resume(). Its scenario is as follows: > > 1. System resume starts. All driver system resume callbacks get called in order. XHCI controller > is resumed by xhci_resume(). > 2. USB2 root hub is resuming. hcd_bus_resume() is being executed. > 3. Before xhci_bus_resume() is finished, XHCI driver receives a port status change event for > an USB2 port with RESUME link state in xhci_irq(). XHCI driver starts the process to resume > HS port for device-initiated resume. > 4. In xhci_bus_resume(), host-initiated resume (direct U0) is performed on the same port that is > resuming in step 3 in below loop: > > if (bus_state->bus_suspended) { > spin_unlock_irqrestore(&xhci->lock, flags); > msleep(USB_RESUME_TIMEOUT); > spin_lock_irqsave(&xhci->lock, flags); > } > for_each_set_bit(port_index, &bus_state->bus_suspended, > BITS_PER_LONG) { > /* Clear PLC to poll it later for U0 transition */ > xhci_test_and_clear_bit(xhci, ports[port_index], > PORT_PLC); > xhci_set_link_state(xhci, ports[port_index], XDEV_U0); > } > 5. Then, link state of the resuming port is observed as U0 in following > xhci_get_usb2_port_status(). xhci_handle_usb2_port_link_resume() has > no chance to get called on the resuming port. > True, thanks for the explanation. If there's a race between system resume and device-initiated resume, and port is resumed in xhci_bus_resume() then yes I see how this could happen. Maybe only call usb_hcd_end_port_resume() if xhci_irq() detected the device-initiated resume. i.e. set a value to resume_done[portnum] and called usb_hcd_start_port_resume() something like: @@ -1088,6 +1088,8 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status, if (link_state == XDEV_U2) *status |= USB_PORT_STAT_L1; if (link_state == XDEV_U0) { + if (bus_state->resume_done[portnum]) + usb_hcd_end_port_resume(&port->rhub->hcd->self, portnum); bus_state->resume_done[portnum] = 0; clear_bit(portnum, &bus_state->resuming_ports); Also xhci_bus_resume() only resumes ports that were forcefully suspended to U3 in xhci_bus_suspend(). Could be worth checking why that device wasn't already in U3 when suspend reached xhci_bus_suspend(). Thanks Mathias