Re: dwc3 stuck in U3 state on USB3-only link

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jerry,

On Thu, Feb 09, 2023 at 05:41:43PM -0800, Jerry Zhang wrote:
> On Thu, Feb 9, 2023 at 12:11 AM Jack Pham <quic_jackp@xxxxxxxxxxx> wrote:
> > On Wed, Feb 08, 2023 at 07:27:04PM -0800, Jerry Zhang wrote:
> > (BTW I notice from these msm-dwc3 logs you must be using a Qualcomm SoC
> > with a downstream kernel.  Though I think the issue is generic enough to
> > debug with the upstream dwc3 gadget, if it does turn out to be some
> > vendor-specific behavior then I would ask you to contact us directly for
> > more focused support.)
> Yep the issue can be reproduced with a QRB5165 devkit plugged into a
> linux desktop using a cable with USB2 snipped. dwc3-msm in our kernel
> is identical to that in
> https://git.codelinaro.org/clo/la/kernel/msm-5.4.git.

Thanks for pointing out the product & kernel.  Again the dwc3-msm is
downstream only but since we are the vendor, I can try to provide some
feedback here on list before turning you over to our usual support
channels.

> > If possible please enable dwc3 tracing events as we might be able to see
> > more info about the specific events that occur when the host reboots.
> I did this by mounting tracefs and echo 1 > events/dwc3/enable and it
> produces a trace file, however the events end the end of the trace
> looks like

> These seem to be data events from the end of the connection, and I
> don't see any events related to suspend or power state.

>           <idle>-0       [006] d.s5   140.648282: dwc3_gadget_ep_cmd:
> ep1in: cmd 'Update Transfer' [30007] params 00000000 00000000 00000000
> --> status: Successful
>           <idle>-0       [000] dnh1   140.736735: dwc3_readl: addr
> 00000000f7508d19 value 00000004
>           <idle>-0       [000] dnh1   140.736739: dwc3_readl: addr
> 00000000967e799a value 00001000
>           <idle>-0       [000] dnh1   140.736741: dwc3_writel: addr
> 00000000967e799a value 80001000
>           <idle>-0       [000] dnh1   140.736743: dwc3_writel: addr
> 00000000f7508d19 value 00000004
>   kworker/u17:10-767     [002] d..1   140.736770: dwc3_event: event
> (00030601): End-Of-Frame [U3]
              ^^^^^^^^^^^^^^^^^
Here it is.  Despite the description this is actually the Suspend event
the controller receives.  commit 6f26ebb79a84 ("usb: dwc3: gadget:
Rename EOPF event macros to Suspend") fixed the tracing log.

Again, probably due to controller having given up on superspeed, falling
back to USB2 and detecting idle condition on D+/D- lines.

> > :) there is a reason that spec compliant USB3.x implementations must
> > also provide D+/D- connectivity; not only for backwards compatibility
> > but also for these sorts of fallback scenarios.
> Understood, we knew we were getting into sketchy territory with this
> but we're actually port splitting on the host side and using that USB2
> slot for a different device, which helps us avoid the need for a hub.
> For embedded systems with a fixed topology, this strategy has a lot of
> advantages if we can get it working.

Understood.  Neat idea if it can work.

> > You'd somehow need to get the controller to go back into Rx.Detect.
> > Since you don't have a way to do USB2 reset on D+/D-, you may need to at
> > least simulate a VBUS toggle, or forcefully reset the dwc3 controller.
> >
> > With the QCOM HW there is a register that can do this.  Please refer to
> > dwc3_qcom_vbus_override_enable() in dwc3-qcom.c for the upstream
> > implementation.
> The equivalent of this is already being called in dwc3-msm.c as
> dwc3_override_vbus_status, except for missing the SW_SESSVLD_SEL flag,
> but I added that and I didn't notice any difference.

Did you toggle off, then on, or just called it only with 'true'?
Toggling off first should signal to the controller the equivalent of a
VBUS disconnect.  The thought is that it would correctly generate a
Disconnect event (please confirm in the tracing logs) and allow the
controller to return back to a nominal state, such that toggling the
VBUS override on again would allow it to start up again from Rx.Detect.

> I'm assuming dwc3-msm and dwc3-qcom are different implementations
> targeting the same device?

Yes.

> I did manage to finally find a quirk that seems promising though. I
> see in dwc-msm that resume_work is skipped if the enable_bus_suspend
> bit is not set
> 
>      case DWC3_CONTROLLER_NOTIFY_OTG_EVENT:
>          dev_err(mdwc->dev, "DWC3_CONTROLLER_NOTIFY_OTG_EVENT received\n");
>          if (dwc->enable_bus_suspend) {
>              mdwc->suspend = dwc->b_suspend;
>              queue_work(mdwc->dwc3_wq, &mdwc->resume_work);
>          }
>          break;
> 
> and indeed we don't have it set so I tried enabling
> snps,bus-suspend-enable.
<snip>
> and we get these messages when the host powers back up. I can verify
> that the timing of these changes depending on how long the host is
> held in reset, so it's definitely detecting the host here rather than
> hitting some time based event. All these events look correct though as
> it claims to be resuming, however there still isn't enumeration and
> the link state still is in U3. The last line still claims to be in the
> suspend state and this is probably what's preventing the resume from
> completing. Looking through the code it seems like it depends on the
> B_SESS_VLD  bit
> 
>      if (!test_bit(B_SESS_VLD, &mdwc->inputs)) {
>          dev_err(mdwc->dev, "BSUSP: !bsv\n");
>          mdwc->drd_state = DRD_STATE_IDLE;
>          cancel_delayed_work_sync(&mdwc->sdp_check);
>          dwc3_otg_start_peripheral(mdwc, 0);
> 
> so somehow this if statement isn't triggering. Does this seem like the
> right track?

I'm doubtful this would help much.  This enable_bus_suspend is another
downstream-only feature intended to allow the dwc3-msm to runtime
suspend during a legitmate (U3 or L2) bus suspend.  You are seeing the
additional logs for the out-of-band signal handler when the LFPS signal
is detected.  However, this assumes that the dwc3 controller itself was
correctly in a state to process the resume or wakeup event from said
signaling but I think you are still in the same boat as before --
superspeed is disabled and there is no activity on D+/D- to reset it.

I'm going to refrain from further discussing downstream-only code here
since its out of scope for the mail list, but hopefully it's also not as
relevant to your problem.  If the above suggestion (toggling VBUS
override off then on) doesn't help then please contact our customer
engineering channel so we can help you more on this specific product.

Thanks,
Jack



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux