Re: dwc3 stuck in U3 state on USB3-only link

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ok I was actually able to get it working! setting
snps,bus-suspend-enable; and adding this simple change

diff --git a/drivers/usb/dwc3/dwc3-msm.c b/drivers/usb/dwc3/dwc3-msm.c
index 801381de3769..09229d25b39a 100644
--- a/drivers/usb/dwc3/dwc3-msm.c
+++ b/drivers/usb/dwc3/dwc3-msm.c
@@ -6331,11 +6331,11 @@ static void handle_state_peripheral(struct
dwc3_msm *mdwc, bool *work)

 static void handle_state_peripheral_suspend(struct dwc3_msm *mdwc)
 {
        struct dwc3 *dwc = platform_get_drvdata(mdwc->dwc3);

-       if (!test_bit(B_SESS_VLD, &mdwc->inputs)) {
+       if (test_bit(B_SESS_VLD, &mdwc->inputs)) {
                dev_dbg(mdwc->dev, "BSUSP: !bsv\n");
                mdwc->drd_state = DRD_STATE_IDLE;
                cancel_delayed_work_sync(&mdwc->sdp_check);
                dwc3_otg_start_peripheral(mdwc, 0);
        } else if (!test_bit(B_SUSPEND, &mdwc->inputs)) {

allows the device to successfully enumerate after a host reboot. Can
you provide some feedback on the correctness of this patch?

Thanks

On Thu, Feb 9, 2023 at 5:41 PM Jerry Zhang <jerry@xxxxxxxxxx> wrote:
>
> Thanks for the detailed responses
> On Thu, Feb 9, 2023 at 12:11 AM Jack Pham <quic_jackp@xxxxxxxxxxx> wrote:
> >
> > Hi Jerry,
> >
> > On Wed, Feb 08, 2023 at 07:27:04PM -0800, Jerry Zhang wrote:
> > > We have a custom board with two linux systems connected by USB 3 wires
> > > only, vbus and USB2 are omitted for space savings. This has pretty
> > > much worked as the controllers are independent, except for the
> > > following bug:
> > >
> > > - When the host system (tegra xhci host driver) reboots, the device
> > > (msm-dwc3) enters the U3 state and never leaves it, even after the
> > > host powers back up.
> > > - Also if the device system happens to finish booting before the host,
> > > the same thing happens, dwc3 gets stuck in U3 and never enumerates.
> >
> > In both of these scenarios when the host is momentarily offline, what
> > is state of the superspeed signal lines?  Specifically, does the host
> > remove terminations from its SSTX lines?
> I haven't been able to verify but assuming the generic behavior is for
> those terminations to be removed if the host is powered off or held in
> reset, then that's probably what's happening here.
> As I'll mention below, the issue is reproducible with a generic linux
> desktop, hence why I'm somewhat confident our host isn't doing
> anything weird.
> >
> > > I'm able to get these messages from the dwc3 driver when the host reboots
> > >
> > > [   34.549834] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received
> > > [   34.555749] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=28100C
> > > [   34.562902] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler link
> > > state = 0x0006
> > > [   34.570319] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler:
> > > unexpected PWR_EVNT, irq_stat=281000
> > > [   34.663734] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received
> > > [   34.669644] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=2C1004
> > > [   34.676698] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler:
> > > unexpected PWR_EVNT, irq_stat=2C1000
> > > [   34.686082] dwc3 a600000.dwc3: dwc3_gadget_suspend_interrupt Entry to 3
> > > [   34.692919] dwc3 a600000.dwc3: Notify controller from
> > > dwc3_gadget_vbus_draw. mA = 2
> > > [   34.700777] msm-dwc3 a600000.ssusb:
> > > DWC3_CONTROLLER_SET_CURRENT_DRAW_EVENT received
> > > [   34.708648] dwc3 a600000.dwc3: Notify OTG from dwc3_gadget_suspend_interrupt
> > > [   34.715888] msm-dwc3 a600000.ssusb: DWC3_CONTROLLER_NOTIFY_OTG_EVENT received
> >
> > (BTW I notice from these msm-dwc3 logs you must be using a Qualcomm SoC
> > with a downstream kernel.  Though I think the issue is generic enough to
> > debug with the upstream dwc3 gadget, if it does turn out to be some
> > vendor-specific behavior then I would ask you to contact us directly for
> > more focused support.)
> Yep the issue can be reproduced with a QRB5165 devkit plugged into a
> linux desktop using a cable with USB2 snipped. dwc3-msm in our kernel
> is identical to that in
> https://git.codelinaro.org/clo/la/kernel/msm-5.4.git.
> >
> > If possible please enable dwc3 tracing events as we might be able to see
> > more info about the specific events that occur when the host reboots.
> I did this by mounting tracefs and echo 1 > events/dwc3/enable and it
> produces a trace file, however the events end the end of the trace
> looks like
>           <idle>-0       [006] d.s5   140.648282: dwc3_gadget_ep_cmd:
> ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000
> --> status: Successful
>           <idle>-0       [000] dnh1   140.736735: dwc3_readl: addr
> 00000000f7508d19 value 00000004
>           <idle>-0       [000] dnh1   140.736739: dwc3_readl: addr
> 00000000967e799a value 00001000
>           <idle>-0       [000] dnh1   140.736741: dwc3_writel: addr
> 00000000967e799a value 80001000
>           <idle>-0       [000] dnh1   140.736743: dwc3_writel: addr
> 00000000f7508d19 value 00000004
>   kworker/u17:10-767     [002] d..1   140.736770: dwc3_event: event
> (00030601): End-Of-Frame [U3]
>   kworker/u17:10-767     [002] dn.1   140.781424: dwc3_readl: addr
> 00000000967e799a value 80001000
>   kworker/u17:10-767     [002] dn.1   140.781426: dwc3_writel: addr
> 00000000967e799a value 00001000
>
> These seem to be data events from the end of the connection, and I
> don't see any events related to suspend or power state.
> > > I think the main thing I'm looking for is validating my existing
> > > understanding and confirming a few things I suspect, but am not sure
> > > of due to unfamiliarity with the details of the USB3 spec:
> > >
> > > - iiuc USB3 power management and states should actually be independent
> > > from both vbus and usb2 lines as all the negotiation happens with LFPS
> > > over the USB3 wires.
> >
> > Yes, but in the corner scenario above with the host going offline, you
> > might be in a situation in which the device abrutly exits its U0 state
> > due to a perceived disconnection or lack of communication on the SS
> > pins.  It might be that the LTSSM could have transitioned to SS.Disabled
> > state--in which case one of the only ways out of that state is, to quote
> > from the USB3.2 spec (7.5.1.1.2 Exit from eSS.Disabled):
> >
> >   "An upstream port shall transition to Rx.Detect only when VBUS
> >    transitions to valid or a USB 2.0 bus reset is detected."
> >
> > But since you don't have VBUS nor usb2 lines connected, it's possible
> > the controller could have gotten stuck here and not have a way out.
> >
> > :) there is a reason that spec compliant USB3.x implementations must
> > also provide D+/D- connectivity; not only for backwards compatibility
> > but also for these sorts of fallback scenarios.
> Understood, we knew we were getting into sketchy territory with this
> but we're actually port splitting on the host side and using that USB2
> slot for a different device, which helps us avoid the need for a hub.
> For embedded systems with a fixed topology, this strategy has a lot of
> advantages if we can get it working.
> >
> > > - I see that entry to U3 requires an LFPS message, but in this case
> > > the host wouldn't have been able to send a message as it is powering
> > > off. Is the device also capable of entering U3 due to timeouts and is
> > > it expected to enter U3 in this situation?
> >
> > In this case since it's obviously not a U3 entry due to LFPS, the only
> > other interpretation of the dwc3's U3 link state is that it is a
> > HS/FS/LS Suspend/L2 state.  This can occur simply by not having activity
> > on D+/D- lines.
> >
> > > - Similarly I've seen that exiting from U3 requires an LFPS message.
> > > My expectation is that the host would wake up all devices on the bus
> > > with LFPS messages shortly after bootup, and either this isn't
> > > happening, or the device is failing to receive or process the message.
> > > If the entry to U3 is expected, how is it then expected to exit U3?
> >
> > I think what might have happened is that when the host rebooted, the
> > device must have abruptly exited U0 and went into eSS.Disabled; at that
> > point the dwc3 controller "falls back" into USB2 mode but since D+/D-
> > are not connected, it is then perceived as entering USB2 suspend.
> > Being in eSS.Disabled could explain why it doesn't respond to further
> > LFPS signaling from the host.
> >
> > You'd somehow need to get the controller to go back into Rx.Detect.
> > Since you don't have a way to do USB2 reset on D+/D-, you may need to at
> > least simulate a VBUS toggle, or forcefully reset the dwc3 controller.
> >
> > With the QCOM HW there is a register that can do this.  Please refer to
> > dwc3_qcom_vbus_override_enable() in dwc3-qcom.c for the upstream
> > implementation.
> The equivalent of this is already being called in dwc3-msm.c as
> dwc3_override_vbus_status, except for missing the SW_SESSVLD_SEL flag,
> but I added that and I didn't notice any difference. I'm assuming
> dwc3-msm and dwc3-qcom are different implementations targeting the
> same device?
>
> I did manage to finally find a quirk that seems promising though. I
> see in dwc-msm that resume_work is skipped if the enable_bus_suspend
> bit is not set
>
>      case DWC3_CONTROLLER_NOTIFY_OTG_EVENT:
>          dev_err(mdwc->dev, "DWC3_CONTROLLER_NOTIFY_OTG_EVENT received\n");
>          if (dwc->enable_bus_suspend) {
>              mdwc->suspend = dwc->b_suspend;
>              queue_work(mdwc->dwc3_wq, &mdwc->resume_work);
>          }
>          break;
>
> and indeed we don't have it set so I tried enabling
> snps,bus-suspend-enable. Now the log looks a bit different:
>
> [  140.600806] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received
> [  140.606720] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=28100C
> [  140.613873] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler link
> state = 0x0006
> [  140.621291] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler:
> unexpected PWR_EVNT, irq_stat=281000
> [  140.714729] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received
> [  140.720635] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=2C1004
> [  140.727688] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler:
> unexpected PWR_EVNT, irq_stat=2C1000
> [  140.736782] dwc3 a600000.dwc3: dwc3_gadget_suspend_interrupt Entry to 3
> [  140.743600] dwc3 a600000.dwc3: Notify controller from
> dwc3_gadget_vbus_draw. mA = 2
> [  140.751465] msm-dwc3 a600000.ssusb:
> DWC3_CONTROLLER_SET_CURRENT_DRAW_EVENT received
> [  140.759335] dwc3 a600000.dwc3: Notify OTG from dwc3_gadget_suspend_interrupt
> [  140.766575] msm-dwc3 a600000.ssusb: DWC3_CONTROLLER_NOTIFY_OTG_EVENT received
> [  140.773906] msm-dwc3 a600000.ssusb:
> DWC3_CONTROLLER_NOTIFY_OTG_EVENT processing
> [  140.781433] msm-dwc3 a600000.ssusb: dwc3_resume_work: dwc3 resume work
> [  140.788182] msm-dwc3 a600000.ssusb: peripheral state
> [  140.793307] msm-dwc3 a600000.ssusb: BPER bsv && susp
> [  141.296798] msm-dwc3 a600000.ssusb: DWC3-msm runtime idle
> [  142.048465] msm-dwc3 a600000.ssusb: DWC3-msm runtime suspend
> [  142.054800] msm-dwc3 a600000.ssusb: DWC3 in low power mode
>
> is the log when the host first powers off.
>
> [  166.306367] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq received
> [  166.312277] msm-dwc3 a600000.ssusb: USB Resume start
> [  166.317484] msm-dwc3 a600000.ssusb: msm_dwc3_pwr_irq_thread
> [  166.323235] msm-dwc3 a600000.ssusb: dwc3_resume_work: dwc3 resume work
> [  166.330001] msm-dwc3 a600000.ssusb: dwc3_msm_resume: exiting lpm
> [  166.336493] msm-dwc3 a600000.ssusb: dwc3_msm_resume: truly resuming ss phy
> [  166.343649] msm-dwc3 a600000.ssusb: DWC3 exited from low power mode
> [  166.350125] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler irq_stat=3C1020
> [  166.357237] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler:
> handling PWR_EVNT_LPM_OUT_L2_MASK
> [  166.366020] msm-dwc3 a600000.ssusb: dwc3_pwr_event_handler:
> unexpected PWR_EVNT, irq_stat=3C1000
> [  166.375094] msm-dwc3 a600000.ssusb: dwc3_resume_work: dwc3 resuming
> [  166.381580] msm-dwc3 a600000.ssusb: peripheral_suspend state
>
> and we get these messages when the host powers back up. I can verify
> that the timing of these changes depending on how long the host is
> held in reset, so it's definitely detecting the host here rather than
> hitting some time based event. All these events look correct though as
> it claims to be resuming, however there still isn't enumeration and
> the link state still is in U3. The last line still claims to be in the
> suspend state and this is probably what's preventing the resume from
> completing. Looking through the code it seems like it depends on the
> B_SESS_VLD  bit
>
>      if (!test_bit(B_SESS_VLD, &mdwc->inputs)) {
>          dev_err(mdwc->dev, "BSUSP: !bsv\n");
>          mdwc->drd_state = DRD_STATE_IDLE;
>          cancel_delayed_work_sync(&mdwc->sdp_check);
>          dwc3_otg_start_peripheral(mdwc, 0);
>
> so somehow this if statement isn't triggering. Does this seem like the
> right track?
> >
> > > I've also tried relevant looking quirks on the gadget side including
> > > ssp-u3-u0-quirk, u2exit_lfps_quirk, disable_scramble_quirk. I don't
> > > see a way to explicitly prevent the controller from entering U3 mode,
> > > is this possible with a register setting?
> > >
> > > Would appreciate any thoughts. If I haven't misunderstood anything,
> > > the next step would probably be to find a beagle 5000 analyzer and
> > > track down the LFPS messages.
> >
> > I this is still a good idea, if at least to see what's happening on the
> > signal lines at a lower level.  Would be great if it could show the
> > state of terminatination when the host is rebooting.
> Unfortunately we don't have one on hand so this will probably be a
> last resort if none of the other paths pan out.
> >
> > Hope that helps,
> > Jack



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux