On Wed, Apr 05, 2023, Krishna Kurapati PSSNV wrote: > > > On 4/5/2023 3:13 AM, Thinh Nguyen wrote: > > On Tue, Apr 04, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > On 4/4/2023 5:19 AM, Thinh Nguyen wrote: > > > > On Thu, Mar 30, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > On 3/30/2023 5:40 AM, Thinh Nguyen wrote: > > > > > > On Wed, Mar 29, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > > > > > > > > > > On 3/29/2023 2:50 AM, Thinh Nguyen wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > On Tue, Mar 28, 2023, Krishna Kurapati wrote: > > > > > > > > > If the core soft reset timeout happens, avoid setting up event > > > > > > > > > buffers and starting gadget as the writes to these registers > > > > > > > > > may not reflect when in reset and setting the run stop bit > > > > > > > > > can lead the controller to access wrong event buffer address > > > > > > > > > resulting in a crash. > > > > > > > > > > > > > > > > > > Signed-off-by: Krishna Kurapati <quic_kriskura@xxxxxxxxxxx> > > > > > > > > > --- > > > > > > > > > drivers/usb/dwc3/gadget.c | 5 ++++- > > > > > > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > > > > > > > > index 3c63fa97a680..f0472801d9a5 100644 > > > > > > > > > --- a/drivers/usb/dwc3/gadget.c > > > > > > > > > +++ b/drivers/usb/dwc3/gadget.c > > > > > > > > > @@ -2620,13 +2620,16 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > > > > > > > > > * device-initiated disconnect requires a core soft reset > > > > > > > > > * (DCTL.CSftRst) before enabling the run/stop bit. > > > > > > > > > */ > > > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > > > + if (ret) > > > > > > > > > + goto done; > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > __dwc3_gadget_start(dwc); > > > > > > > > > ret = dwc3_gadget_run_stop(dwc, true, false); > > > > > > > > > } > > > > > > > > > +done: > > > > > > > > > pm_runtime_put(dwc->dev); > > > > > > > > > return ret; > > > > > > > > > -- > > > > > > > > > 2.40.0 > > > > > > > > > > > > > > > > > > > > > > > > > I think there's one more place that may needs this check. Can you also > > > > > > > > add this check in __dwc3_set_mode()? > > > > > > > > > > > > > > Hi Thinh, > > > > > > > > > > > > > > Sure. Will do it. > > > > > > > Will the below be good enough ? Or would it be good to add an error/warn log > > > > > > > there> > > > > > > > > > > > > There's already a warning message in dwc3_core_soft_reset() if it fails. > > > > > > > > > > > > > > > > > > > > kriskura@hu-kriskura-hyd:/local/mnt/workspace/krishna/skales2/skales/kernel$ > > > > > > > git diff drivers/usb/ > > > > > > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > > > > > > > index 476b63618511..8d1d213d1dcd 100644 > > > > > > > --- a/drivers/usb/dwc3/core.c > > > > > > > +++ b/drivers/usb/dwc3/core.c > > > > > > > @@ -210,7 +210,9 @@ static void __dwc3_set_mode(struct work_struct *work) > > > > > > > } > > > > > > > break; > > > > > > > case DWC3_GCTL_PRTCAP_DEVICE: > > > > > > > - dwc3_core_soft_reset(dwc); > > > > > > > + ret = dwc3_core_soft_reset(dwc); > > > > > > > + if (ret) > > > > > > > + goto out; > > > > > > > > > > > > > > dwc3_event_buffers_setup(dwc); > > > > > > > > > > > > > > > > > > > If soft-reset failed, the controller is in a bad state. We should not > > > > > > perform any further operation until the next hard reset. We should flag > > > > > > the controller as dead. I don't think we have the equivalent of the > > > > > > host's HCD_FLAG_DEAD. It may require some work in the UDC core. Perhaps > > > > > > we can flag within dwc3 for now and prevent any further operation for a > > > > > > simpler fix. > > > > > > > > > > > Hi Thinh, > > > > > > > > > > Are you referring that if __dwc3_set_mode failed with core soft reset > > > > > timing out, the caller i.e., dwc3_set_mode who queues the work need to know > > > > > that the operation actually failed. So we can add a flag to indicate that > > > > > gadget is dead and the caller of dwc3_set_mode can check the flag to see if > > > > > the operation is successful or not. > > > > > > > > > > Or am I misunderstanding your comment ? > > > > > > > > > > > > > Not just in __dwc3_set_mode(). I mean any time dwc3_core_soft_reset > > > > fails, then we set this flag. So that it can prevent the user calling > > > > any gadget ops causing more crashes/invalid behavior. The > > > > dwc->softconnect is already wrong on pullup() on failure. > > > > > > > > So that we can have a check in different gadget ops. For pullup(): > > > > > > > > static int dwc3_gadget_pullup() { > > > > if (dwc->udc_is_dead) { > > > > dev_err(dev, "reset me. x_x \n"); > > > > return; > > > > } > > > > > > > > abc(); > > > > } > > > > > > > > Perhaps the effort is probably the same if we enhance the UDC core for > > > > this? In any case, I'm fine either way. > > > > > > > > Thanks, > > > > Thinh > > > > > > Hi Thinh, > > > > > > So you don't want UDC to retry pullup if it fails the first time ? As per > > > patch-2 of this series, I was trying to propagate the EITMEDOUT to UDC so > > > that the caller (most probably configfs) can take appropriate action as to > > > whether it must retry pullup or not. > > > > > > > Now I'm confused. If the soft-reset times out, that means that the > > soft-reset (self-clearing) bit isn't cleared. How can we retry if it's > > stuck in this state? My impression is that soft-reset would not complete > > at all. Is that not the case for you, or it's simply because we need a > > longer soft-reset timeout? > > > > Thanks, > > Thinh > > Hi Thinh, > > Sorry for not being clear. The intention of patch-1 was to ensure we don't > start the controller if reset times out and patch-2 was to ensure that UDC > is in sync with controller by understanding that gadget_connect has failed > and necessary cleanup has to be done in gadget_bind_driver. That should still be there. > > But usually since the UDC_store is the one that is causing pullup to be > called, the error value is propagated back to UDC_store. If it sees a > failure, it does a retry to pullup. > > I didn't check whether subsequent retries by UDC to pullup are helping > clear the reset bit or not. But I thought retrying pullup won't be of any > harm. It's fine to retry. I'm thinking that the controller is in a bad state at this point that we can't recover (hopefully that's not the case). > > I now get that my patches are incomplete w.r.t handling the timeout. > > IIRC one of the following is what you are suggesting we need to do: > > Option-1: > Set a flag when reset times out and clear it upon core_exit / core_init. If > the flag is set, block calls to all the gadget_ops in dwc3. Basically even > if retry happens from configfs/UDC, we bail out in pullup/udc_start even > without trying the requested gadget operation. > > Option-2: > If gadget_connect fails with -ETIMEDOUT in UDC, handle the failure and > implement the same flag in UDC and don't even call any gadget_ops. > I'm thinking of option-1. For option-2, we can't control if the gadget_ops will be called. We only have control how we will respond in case they get called again. But now I'm thinking again, I think it may be ok without adding the flag. The UDC core and gadget driver won't do anything else before pullup(1) is successful. Calling other gadget_ops should be harmless (ie. it won't crash/break the system)? Sorry for the noise, but I think it may be ok without marking the controller dead. I wonder if we can confirm my suspiction on retry? I believe this is not easy to reproduce on your setup? If not, I think we can take your change as is. Thanks, Thinh