ok I was actually able to get it working! setting snps,bus-suspend-enable; and adding this simple change diff --git a/drivers/usb/dwc3/dwc3-msm.c b/drivers/usb/dwc3/dwc3-msm.c index 801381de3769..09229d25b39a 100644 --- a/drivers/usb/dwc3/dwc3-msm.c +++ b/drivers/usb/dwc3/dwc3-msm.c @@ -6331,11 +6331,11 @@ static void handle_state_peripheral(struct dwc3_msm *mdwc, bool *work) static void handle_state_peripheral_suspend(struct dwc3_msm *mdwc) { struct dwc3 *dwc = platform_get_drvdata(mdwc->dwc3); - if (!test_bit(B_SESS_VLD, &mdwc->inputs)) { + if (test_bit(B_SESS_VLD, &mdwc->inputs)) { dev_dbg(mdwc->dev, "BSUSP: !bsv\n"); mdwc->drd_state = DRD_STATE_IDLE; cancel_delayed_work_sync(&mdwc->sdp_check); dwc3_otg_start_peripheral(mdwc, 0); } else if (!test_bit(B_SUSPEND, &mdwc->inputs)) { allows the device to successfully enumerate after a host reboot. Can you provide some feedback on the correctness of this patch? Thanks On Thu, Feb 9, 2023 at 5:41 PM Jerry Zhang <jerry@xxxxxxxxxx> wrote: > > Thanks for the detailed responses > On Thu, Feb 9, 2023 at 12:11 AM Jack Pham <quic_jackp@xxxxxxxxxxx> wrote: > > > > Hi Jerry, > > > > On Wed, Feb 08, 2023 at 07:27:04PM -0800, Jerry Zhang wrote: > > > We have a custom board with two linux systems connected by USB 3 wires > > > only, vbus and USB2 are omitted for space savings. This has pretty > > > much worked as the controllers are independent, except for the > > > following bug: > > > > > > - When the host system (tegra xhci host driver) reboots, the device > > > (msm-dwc3) enters the U3 state and never leaves it, even after the > > > host powers back up. > > > - Also if the device system happens to finish booting before the host, > > > the same thing happens, dwc3 gets stuck in U3 and never enumerates. > > > > In both of these scenarios when the host is momentarily offline, what > > is state of the superspeed signal lines? Specifically, does the host > > remove terminations from its SSTX lines? > I haven't been able to verify but assuming the generic behavior is for > those terminations to be removed if the host is powered off or held in > reset, then that's probably what's happening here. > As I'll mention below, the issue is reproducible with a generic linux > desktop, hence why I'm somewhat confident our host isn't doing > anything weird. > > > > > I'm able to get these messages from the dwc3 driver when the host reboots > > > > > > [ 34.549834] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received > > > [ 34.555749] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=28100C > > > [ 34.562902] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler link > > > state = 0x0006 > > > [ 34.570319] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler: > > > unexpected PWR_EVNT, irq_stat=281000 > > > [ 34.663734] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received > > > [ 34.669644] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=2C1004 > > > [ 34.676698] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler: > > > unexpected PWR_EVNT, irq_stat=2C1000 > > > [ 34.686082] dwc3 a600000.dwc3: dwc3_gadget_suspend_interrupt Entry to 3 > > > [ 34.692919] dwc3 a600000.dwc3: Notify controller from > > > dwc3_gadget_vbus_draw. mA = 2 > > > [ 34.700777] msm-dwc3 a600000.ssusb: > > > DWC3_CONTROLLER_SET_CURRENT_DRAW_EVENT received > > > [ 34.708648] dwc3 a600000.dwc3: Notify OTG from dwc3_gadget_suspend_interrupt > > > [ 34.715888] msm-dwc3 a600000.ssusb: DWC3_CONTROLLER_NOTIFY_OTG_EVENT received > > > > (BTW I notice from these msm-dwc3 logs you must be using a Qualcomm SoC > > with a downstream kernel. Though I think the issue is generic enough to > > debug with the upstream dwc3 gadget, if it does turn out to be some > > vendor-specific behavior then I would ask you to contact us directly for > > more focused support.) > Yep the issue can be reproduced with a QRB5165 devkit plugged into a > linux desktop using a cable with USB2 snipped. dwc3-msm in our kernel > is identical to that in > https://git.codelinaro.org/clo/la/kernel/msm-5.4.git. > > > > If possible please enable dwc3 tracing events as we might be able to see > > more info about the specific events that occur when the host reboots. > I did this by mounting tracefs and echo 1 > events/dwc3/enable and it > produces a trace file, however the events end the end of the trace > looks like > <idle>-0 [006] d.s5 140.648282: dwc3_gadget_ep_cmd: > ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000 > --> status: Successful > <idle>-0 [000] dnh1 140.736735: dwc3_readl: addr > 00000000f7508d19 value 00000004 > <idle>-0 [000] dnh1 140.736739: dwc3_readl: addr > 00000000967e799a value 00001000 > <idle>-0 [000] dnh1 140.736741: dwc3_writel: addr > 00000000967e799a value 80001000 > <idle>-0 [000] dnh1 140.736743: dwc3_writel: addr > 00000000f7508d19 value 00000004 > kworker/u17:10-767 [002] d..1 140.736770: dwc3_event: event > (00030601): End-Of-Frame [U3] > kworker/u17:10-767 [002] dn.1 140.781424: dwc3_readl: addr > 00000000967e799a value 80001000 > kworker/u17:10-767 [002] dn.1 140.781426: dwc3_writel: addr > 00000000967e799a value 00001000 > > These seem to be data events from the end of the connection, and I > don't see any events related to suspend or power state. > > > I think the main thing I'm looking for is validating my existing > > > understanding and confirming a few things I suspect, but am not sure > > > of due to unfamiliarity with the details of the USB3 spec: > > > > > > - iiuc USB3 power management and states should actually be independent > > > from both vbus and usb2 lines as all the negotiation happens with LFPS > > > over the USB3 wires. > > > > Yes, but in the corner scenario above with the host going offline, you > > might be in a situation in which the device abrutly exits its U0 state > > due to a perceived disconnection or lack of communication on the SS > > pins. It might be that the LTSSM could have transitioned to SS.Disabled > > state--in which case one of the only ways out of that state is, to quote > > from the USB3.2 spec (7.5.1.1.2 Exit from eSS.Disabled): > > > > "An upstream port shall transition to Rx.Detect only when VBUS > > transitions to valid or a USB 2.0 bus reset is detected." > > > > But since you don't have VBUS nor usb2 lines connected, it's possible > > the controller could have gotten stuck here and not have a way out. > > > > :) there is a reason that spec compliant USB3.x implementations must > > also provide D+/D- connectivity; not only for backwards compatibility > > but also for these sorts of fallback scenarios. > Understood, we knew we were getting into sketchy territory with this > but we're actually port splitting on the host side and using that USB2 > slot for a different device, which helps us avoid the need for a hub. > For embedded systems with a fixed topology, this strategy has a lot of > advantages if we can get it working. > > > > > - I see that entry to U3 requires an LFPS message, but in this case > > > the host wouldn't have been able to send a message as it is powering > > > off. Is the device also capable of entering U3 due to timeouts and is > > > it expected to enter U3 in this situation? > > > > In this case since it's obviously not a U3 entry due to LFPS, the only > > other interpretation of the dwc3's U3 link state is that it is a > > HS/FS/LS Suspend/L2 state. This can occur simply by not having activity > > on D+/D- lines. > > > > > - Similarly I've seen that exiting from U3 requires an LFPS message. > > > My expectation is that the host would wake up all devices on the bus > > > with LFPS messages shortly after bootup, and either this isn't > > > happening, or the device is failing to receive or process the message. > > > If the entry to U3 is expected, how is it then expected to exit U3? > > > > I think what might have happened is that when the host rebooted, the > > device must have abruptly exited U0 and went into eSS.Disabled; at that > > point the dwc3 controller "falls back" into USB2 mode but since D+/D- > > are not connected, it is then perceived as entering USB2 suspend. > > Being in eSS.Disabled could explain why it doesn't respond to further > > LFPS signaling from the host. > > > > You'd somehow need to get the controller to go back into Rx.Detect. > > Since you don't have a way to do USB2 reset on D+/D-, you may need to at > > least simulate a VBUS toggle, or forcefully reset the dwc3 controller. > > > > With the QCOM HW there is a register that can do this. Please refer to > > dwc3_qcom_vbus_override_enable() in dwc3-qcom.c for the upstream > > implementation. > The equivalent of this is already being called in dwc3-msm.c as > dwc3_override_vbus_status, except for missing the SW_SESSVLD_SEL flag, > but I added that and I didn't notice any difference. I'm assuming > dwc3-msm and dwc3-qcom are different implementations targeting the > same device? > > I did manage to finally find a quirk that seems promising though. I > see in dwc-msm that resume_work is skipped if the enable_bus_suspend > bit is not set > > case DWC3_CONTROLLER_NOTIFY_OTG_EVENT: > dev_err(mdwc->dev, "DWC3_CONTROLLER_NOTIFY_OTG_EVENT received\n"); > if (dwc->enable_bus_suspend) { > mdwc->suspend = dwc->b_suspend; > queue_work(mdwc->dwc3_wq, &mdwc->resume_work); > } > break; > > and indeed we don't have it set so I tried enabling > snps,bus-suspend-enable. Now the log looks a bit different: > > [ 140.600806] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received > [ 140.606720] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=28100C > [ 140.613873] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler link > state = 0x0006 > [ 140.621291] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler: > unexpected PWR_EVNT, irq_stat=281000 > [ 140.714729] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received > [ 140.720635] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=2C1004 > [ 140.727688] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler: > unexpected PWR_EVNT, irq_stat=2C1000 > [ 140.736782] dwc3 a600000.dwc3: dwc3_gadget_suspend_interrupt Entry to 3 > [ 140.743600] dwc3 a600000.dwc3: Notify controller from > dwc3_gadget_vbus_draw. mA = 2 > [ 140.751465] msm-dwc3 a600000.ssusb: > DWC3_CONTROLLER_SET_CURRENT_DRAW_EVENT received > [ 140.759335] dwc3 a600000.dwc3: Notify OTG from dwc3_gadget_suspend_interrupt > [ 140.766575] msm-dwc3 a600000.ssusb: DWC3_CONTROLLER_NOTIFY_OTG_EVENT received > [ 140.773906] msm-dwc3 a600000.ssusb: > DWC3_CONTROLLER_NOTIFY_OTG_EVENT processing > [ 140.781433] msm-dwc3 a600000.ssusb: dwc3_resume_work: dwc3 resume work > [ 140.788182] msm-dwc3 a600000.ssusb: peripheral state > [ 140.793307] msm-dwc3 a600000.ssusb: BPER bsv && susp > [ 141.296798] msm-dwc3 a600000.ssusb: DWC3-msm runtime idle > [ 142.048465] msm-dwc3 a600000.ssusb: DWC3-msm runtime suspend > [ 142.054800] msm-dwc3 a600000.ssusb: DWC3 in low power mode > > is the log when the host first powers off. > > [ 166.306367] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received > [ 166.312277] msm-dwc3 a600000.ssusb: USB Resume start > [ 166.317484] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq_thread > [ 166.323235] msm-dwc3 a600000.ssusb: dwc3_resume_work: dwc3 resume work > [ 166.330001] msm-dwc3 a600000.ssusb: dwc3_msm_resume: exiting lpm > [ 166.336493] msm-dwc3 a600000.ssusb: dwc3_msm_resume: truly resuming ss phy > [ 166.343649] msm-dwc3 a600000.ssusb: DWC3 exited from low power mode > [ 166.350125] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=3C1020 > [ 166.357237] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler: > handling PWR_EVNT_LPM_OUT_L2_MASK > [ 166.366020] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler: > unexpected PWR_EVNT, irq_stat=3C1000 > [ 166.375094] msm-dwc3 a600000.ssusb: dwc3_resume_work: dwc3 resuming > [ 166.381580] msm-dwc3 a600000.ssusb: peripheral_suspend state > > and we get these messages when the host powers back up. I can verify > that the timing of these changes depending on how long the host is > held in reset, so it's definitely detecting the host here rather than > hitting some time based event. All these events look correct though as > it claims to be resuming, however there still isn't enumeration and > the link state still is in U3. The last line still claims to be in the > suspend state and this is probably what's preventing the resume from > completing. Looking through the code it seems like it depends on the > B_SESS_VLD bit > > if (!test_bit(B_SESS_VLD, &mdwc->inputs)) { > dev_err(mdwc->dev, "BSUSP: !bsv\n"); > mdwc->drd_state = DRD_STATE_IDLE; > cancel_delayed_work_sync(&mdwc->sdp_check); > dwc3_otg_start_peripheral(mdwc, 0); > > so somehow this if statement isn't triggering. Does this seem like the > right track? > > > > > I've also tried relevant looking quirks on the gadget side including > > > ssp-u3-u0-quirk, u2exit_lfps_quirk, disable_scramble_quirk. I don't > > > see a way to explicitly prevent the controller from entering U3 mode, > > > is this possible with a register setting? > > > > > > Would appreciate any thoughts. If I haven't misunderstood anything, > > > the next step would probably be to find a beagle 5000 analyzer and > > > track down the LFPS messages. > > > > I this is still a good idea, if at least to see what's happening on the > > signal lines at a lower level. Would be great if it could show the > > state of terminatination when the host is rebooting. > Unfortunately we don't have one on hand so this will probably be a > last resort if none of the other paths pan out. > > > > Hope that helps, > > Jack