On Tue, Oct 18, 2022, Michael Grzeschik wrote: > Hi Thinh, > > On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote: > > On Mon, Oct 17, 2022, Dan Vacura wrote: > > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote: > > > > On Mon, Oct 17, 2022, Dan Vacura wrote: > > > > > From: Jeff Vanhoof <qjv001@xxxxxxxxxxxx> > > > > > > > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when > > > > > no_interrupt=1 is used. This can happen if the hardware is still using > > > > > the data associated with a TRB after the usb_request's ->complete call > > > > > has been made. Instead of immediately releasing a request when a Missed > > > > > ISOC interrupt has occurred, this change will add logic to cancel the > > > > > request instead where it will eventually be released when the > > > > > END_TRANSFER command has completed. This logic is similar to some of the > > > > > cleanup done in dwc3_gadget_ep_dequeue. > > > > > > > > This doesn't sound right. How did you determine that the hardware is > > > > still using the data associated with the TRB? Did you check the TRB's > > > > HWO bit? > > > > > > The problem we're seeing was mentioned in the summary of this patch > > > series, issue #1. Basically, with the following patch > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@xxxxxxxxxxxxxx/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$ > > > integrated a smmu panic is occurring on our Android device with the 5.15 > > > kernel which is: > > > > > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3! > > > > > > The uvc gadget driver appears to be the first (and only) gadget that > > > uses the no_interrupt=1 logic, so this seems to be a new condition for > > > the dwc3 driver. In our configuration, we have up to 64 requests and the > > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list > > > would get up to that amount when looping through to cleanup the > > > completed requests. From testing and debugging the smmu panic occurs > > > when a -EXDEV status shows up and right after > > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion > > > we had was the requests were getting returned to the gadget too early. > > > > As I mentioned, if the status is updated to missed isoc, that means that > > the controller returned ownership of the TRB to the driver. At least for > > the particular request with -EXDEV, its TRBs are completed. I'm not > > clear on your conclusion. > > > > Do we know where did the crash occur? Is it from dwc3 driver or from uvc > > driver, and at what line? It'd great if we can see the driver log. > > > > > > > > > > > > > The dwc3 driver would only give back the requests if the TRBs of the > > > > associated requests are completed or when the device is disconnected. > > > > If the TRB indicated missed isoc, that means that the TRB is completed > > > > and its status was updated. > > > > > > Interesting, the device is not disconnected as we don't get the > > > -ESHUTDOWN status back and with this patch in place things continue > > > after a -EXDEV status is received. > > > > > > > Actually, minor correction here: a recent change > > b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests") > > changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint. > > This doesn't look right. > > > > While disabling endpoint may also apply for other cases such as > > switching alternate interface in addition to disconnect, -ESHUTDOWN > > seems more fitting there. > > > > Hi Michael, > > > > Can you help clarify for the change above? This changed the usage of > > requests. Now requests returned by disconnection won't be returned as > > -ESHUTDOWN. > > When writing the patch, I was looking into > Documentation/driver-api/usb/error-codes.rst. > > After looking into it today, I see that ESHUTDOWN should be send on > ep_disable (device disable) and ECONNRESET on stop_active_transfer. > So I probably just mixed them up, while writing the patch. :/ > I think you mean ECONNRESET for ep_dequeue()? dwc3_stop_active_transfer() is called for both scenarios. > The followup patch would then just be to swap the status results of > __dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the > dwc3_remove_requests call. > > Michael Can you help make a fix? Thanks! Thinh