On 5/5/2021 8:19 AM, Alan Stern wrote: > On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote: >> >> Hi, >> >> Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes: >>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >>>> Hi, >>>> >>>> Wesley Cheng wrote: >>>>> If an error is received when issuing a start or update transfer >>>>> command, the error handler will stop all active requests (including >>>>> the current USB request), and call dwc3_gadget_giveback() to notify >>>>> function drivers of the requests which have been stopped. Avoid >>>>> having to cancel the current request which is trying to be queued, as >>>>> the function driver will handle the EP queue error accordingly. >>>>> Simply unmap the request as it was done before, and allow previously >>>>> started transfers to be cleaned up. >>>>> >>> >>> Hi Thinh, >>> >>>> >>>> It looks like you're still letting dwc3 stopping and cancelling all the >>>> active requests instead letting the function driver doing the dequeue. >>>> >>> >>> Yeah, main issue isn't due to the function driver doing dequeue, but >>> having cleanup (ie USB request free) if there is an error during >>> usb_ep_queue(). >>> >>> The function driver in question at the moment is the f_fs driver in AIO >>> mode. When async IO is enabled in the FFS driver, every time it queues >>> a packet, it will allocate a io_data struct beforehand. If the >>> usb_ep_queue() fails it will free this io_data memory. Problem is that, >>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS >>> driver will also schedule a work item (within io_data struct) to handle >>> the completion. So you end up with a flow like below >>> >>> allocate io_data (ffs) >>> --> usb_ep_queue() >>> --> __dwc3_gadget_kick_transfer() >>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>> --> dwc3_gadget_giveback(ECONNRESET) >>> ffs completion callback >>> queue work item within io_data >>> --> usb_ep_queue returns EINVAL >>> ffs frees io_data >>> ... >>> >>> work scheduled >>> --> NULL pointer/memory fault as io_data is freed Hi Alan, > > Am I reading this correctly? It looks like usb_ep_queue() returns a > -EINVAL error, but then the completion callback gets invoked with a > status of -ECONNRESET. > Yes, your understanding of the current behavior is correct. In this situation we'd get a completion callback with ECONNRESET, and then also return EINVAL to usb_ep_queue(). >> I have some vague memory of discussing (something like) this with Alan >> Stern long ago and the conclusion was that the gadget driver should >> handle cases such as this. > > Indeed, this predates the creation of the Gadget API; the same design > principle applies to the host-side API. It's a very simple idea: > > If an URB or usb_request submission succeeds, it is guaranteed > that the completion routine will be called when the transfer is > finished, cancelled, aborted, or whatever (note that this may > happen before the submission call returns). > > If an URB or usb_request submission fails, it is guaranteed that > the completion routine will not be called. > > So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the > completion handler is called), this is a bug Thanks for this. So we're trying to only allow one or the other to happen right now. (either completion with an error, or returning error for usb_ep_queue()) I think that would be OK then. Thanks Wesley Cheng > > Alan Stern > >> OTOH, we're returning failure during >> usb_ep_queue() which tells me there's something with dwc3 (perhaps not >> exclusively, but that's yet to be shown). >> >> If I understood the whole thing correctly, we want everything except the >> current request (the one that failed START or UPDATE transfer) to go >> through giveback(). This really tells me that we're not handling error >> case in kick_transfer and/or prepare_trbs() correctly. >> >> I also don't want to pass another argument to kick_transfer because it >> should be unnecessary: the current request should *always* be the last >> one in the list. Therefore we should rely on something like >> list_last_entry() followed by list_for_each_entry_safe_reverse() to >> handle this without a special case. >> >> ret = dwc3_send_gadget_ep_cmd(); >> if (ret < 0) { >> current = list_last_entry(); >> >> unmap(current); >> for_each_trb_in(current) { >> clear_HWO(trb); >> } >> >> list_for_entry_safe_reverse() { >> move_cancelled(); >> } >> } >> >> -- >> balbi > > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project