Am 30.09.24 um 08:27 schrieb Stefan Wahren:
Hi, recently i submitted the commit d483f034f032 ("usb: dwc2: Skip clock gating on Broadcom SoCs") in order to fix an issue with suspend handling. But this change reveals another issue at least on Raspberry Pi 3 B (arm64/defconfig) for the following scenario: 1. Power off and Disconnect all external USB from Raspberry Pi 3 B Plus 2. Power on Raspberry Pi 3. Wait until successfully booted 4. Connect USB devices like keyboard The expected behavior would be that all devices are enumerated, but this doesn't happend. Instead i observe that the DWC2 stays in lx_state DWC2_L2 forever (bad case): [ 2.334366] dwc2 3f980000.usb: supply vusb_d not found, using dummy regulator [ 2.341892] dwc2 3f980000.usb: supply vusb_a not found, using dummy regulator [ 2.400027] dwc2 3f980000.usb: DWC OTG Controller [ 2.404868] dwc2 3f980000.usb: new USB bus registered, assigned bus number 1 [ 2.412087] dwc2 3f980000.usb: irq 51, io mem 0x3f980000 [ 2.711826] usb 1-1: new high-speed USB device number 2 using dwc2 [ 3.195838] usb 1-1.1: new high-speed USB device number 3 using dwc2 [ 3.435829] dwc2 3f980000.usb: dwc2_port_suspend [ 3.459914] dwc2 3f980000.usb: _dwc2_hcd_suspend [ 9.009743] dwc2 3f980000.usb: _dwc2_hcd_resume [ 9.030667] dwc2 3f980000.usb: dwc2_port_suspend [ 9.044137] dwc2 3f980000.usb: _dwc2_hcd_suspend [ 9.044222] dwc2 3f980000.usb: _dwc2_hcd_resume [ 9.354370] usb 1-1.1: new high-speed USB device number 4 using dwc2 [ 9.584095] dwc2 3f980000.usb: dwc2_port_suspend [ 9.599997] dwc2 3f980000.usb: _dwc2_hcd_suspend
Now i spend several hours into investigating this issue and gained at least some insights. The reason why the DWC2 stuck in the suspend state is because the relevant irq 51 doesn't fire anymore in the bad case. I didn't figure out what's causing this, but i suspect this is related to the call of dwc2_port_suspend() by dwc2_hcd_hub_control(). According to the implementation it looks like that dwc2_hcd_hub_control assumes that if dwc2_port_suspend() returns 0 the bus is suspended, but for the corner case (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_NONE && hsotg->params.no_clock_gating) the function returns 0 but bus_suspended stays false. So it seems to me that usb/core and dwc2 becomes async about their states. As a hack i made dwc2_port_suspend() to return an error for this corner case, which prevents this issue but also pm_runtime. Maybe this is a relevant side node: in the bad case the onboard Ethernet chip LAN7800 is also not probed after startup (just root and the hubs). It looks like a race between LAN7800 enumeration and pm_runtime. Another hint why this seems only happen on the Raspberry Pi is because there is no PHY or clock control available to Linux.