On 4/6/2023 7:44 AM, Krishna Kurapati PSSNV wrote:
On 4/6/2023 6:15 AM, Thinh Nguyen wrote:
On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote:
On 4/5/2023 3:13 AM, Thinh Nguyen wrote:
On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote:
On 4/4/2023 5:19 AM, Thinh Nguyen wrote:
On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote:
On 3/30/2023 5:40 AM, Thinh Nguyen wrote:
On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote:
On 3/29/2023 2:50 AM, Thinh Nguyen wrote:
Hi,
On Tue, Mar 28, 2023, Krishna Kurapati wrote:
If the core soft reset timeout happens, avoid setting up event
buffers and starting gadget as the writes to these registers
may not reflect when in reset and setting the run stop bit
can lead the controller to access wrong event buffer address
resulting in a crash.
Signed-off-by: Krishna Kurapati <quic_kriskura@xxxxxxxxxxx>
---
drivers/usb/dwc3/gadget.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c
b/drivers/usb/dwc3/gadget.c
index 3c63fa97a680..f0472801d9a5 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct
usb_gadget *g, int is_on)
* device-initiated disconnect requires a core
soft reset
* (DCTL.CSftRst) before enabling the run/stop
bit.
*/
- dwc3_core_soft_reset(dwc);
+ ret = dwc3_core_soft_reset(dwc);
+ if (ret)
+ goto done;
dwc3_event_buffers_setup(dwc);
__dwc3_gadget_start(dwc);
ret = dwc3_gadget_run_stop(dwc, true, false);
}
+done:
pm_runtime_put(dwc->dev);
return ret;
--
2.40.0
I think there's one more place that may needs this check. Can
you also
add this check in __dwc3_set_mode()?
Hi Thinh,
Sure. Will do it.
Will the below be good enough ? Or would it be good to add an
error/warn log
there>
There's already a warning message in dwc3_core_soft_reset() if
it fails.
kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$
git diff drivers/usb/
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 476b63618511..8d1d213d1dcd 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct
work_struct *work)
}
break;
case DWC3_GCTL_PRTCAP_DEVICE:
- dwc3_core_soft_reset(dwc);
+ ret = dwc3_core_soft_reset(dwc);
+ if (ret)
+ goto out;
dwc3_event_buffers_setup(dwc);
If soft-reset failed, the controller is in a bad state. We
should not
perform any further operation until the next hard reset. We
should flag
the controller as dead. I don't think we have the equivalent of the
host's HCD_FLAG_DEAD. It may require some work in the UDC core.
Perhaps
we can flag within dwc3 for now and prevent any further
operation for a
simpler fix.
Hi Thinh,
Are you referring that if __dwc3_set_mode failed with core
soft reset
timing out, the caller i.e., dwc3_set_mode who queues the work
need to know
that the operation actually failed. So we can add a flag to
indicate that
gadget is dead and the caller of dwc3_set_mode can check the flag
to see if
the operation is successful or not.
Or am I misunderstanding your comment ?
Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset
fails, then we set this flag. So that it can prevent the user calling
any gadget ops causing more crashes/invalid behavior. The
dwc->softconnect is already wrong on pullup() on failure.
So that we can have a check in different gadget ops. For pullup():
static int dwc3_gadget_pullup() {
if (dwc->udc_is_dead) {
dev_err(dev, "reset me. x_x \n");
return;
}
abc();
}
Perhaps the effort is probably the same if we enhance the UDC core
for
this? In any case, I'm fine either way.
Thanks,
Thinh
Hi Thinh,
So you don't want UDC to retry pullup if it fails the first time
? As per
patch-2 of this series, I was trying to propagate the EITMEDOUT to
UDC so
that the caller (most probably configfs) can take appropriate
action as to
whether it must retry pullup or not.
Now I'm confused. If the soft-reset times out, that means that the
soft-reset (self-clearing) bit isn't cleared. How can we retry if it's
stuck in this state? My impression is that soft-reset would not
complete
at all. Is that not the case for you, or it's simply because we need a
longer soft-reset timeout?
Thanks,
Thinh
Hi Thinh,
Sorry for not being clear. The intention of patch-1 was to ensure
we don't
start the controller if reset times out and patch-2 was to ensure
that UDC
is in sync with controller by understanding that gadget_connect has
failed
and necessary cleanup has to be done in gadget_bind_driver.
That should still be there.
But usually since the UDC_store is the one that is causing pullup to be
called, the error value is propagated back to UDC_store. If it sees a
failure, it does a retry to pullup.
I didn't check whether subsequent retries by UDC to pullup are helping
clear the reset bit or not. But I thought retrying pullup won't be of
any
harm.
It's fine to retry. I'm thinking that the controller is in a bad state
at this point that we can't recover (hopefully that's not the case).
I now get that my patches are incomplete w.r.t handling the timeout.
IIRC one of the following is what you are suggesting we need to do:
Option-1:
Set a flag when reset times out and clear it upon core_exit /
core_init. If
the flag is set, block calls to all the gadget_ops in dwc3. Basically
even
if retry happens from configfs/UDC, we bail out in pullup/udc_start even
without trying the requested gadget operation.
Option-2:
If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and
implement the same flag in UDC and don't even call any gadget_ops.
Hi Thinh,
Thanks for the review.
I'm thinking of option-1. For option-2, we can't control if the
gadget_ops will be called. We only have control how we will respond in
case they get called again.
But now I'm thinking again, I think it may be ok without adding the
flag. The UDC core and gadget driver won't do anything else before
pullup(1) is successful. Calling other gadget_ops should be harmless
(ie. it won't crash/break the system)?
I can give this a try in long run testing (For 7-14 days) and see if
anything else is breaking.
Most probably we do a composition switch / PIPO in between which can
call usb_gadget_unregister_driver which might invoke a pullup(0)
followed by udc_stop() and like you mentioned must not be a problem.
Sorry for the noise, but I think it may be ok without marking the
controller dead. I wonder if we can confirm my suspiction on retry? I
believe this is not easy to reproduce on your setup? If not, I think we
can take your change as is.
Hi Thinh,
I got this patch tested on two diff Gen-2 targets for around 10 days
and no issues were seen. (No SMMU fault seen on a 10 day run). Let me
know of any other concerns that might come up with this patch. Else I
can rebase it to get merged.
Regards,
Krishna,