Hi Wesley, Wesley Cheng wrote: > > > On 5/3/2021 7:20 PM, Thinh Nguyen wrote: >> Hi, >> >> Wesley Cheng wrote: >>> If an error is received when issuing a start or update transfer >>> command, the error handler will stop all active requests (including >>> the current USB request), and call dwc3_gadget_giveback() to notify >>> function drivers of the requests which have been stopped. Avoid >>> having to cancel the current request which is trying to be queued, as >>> the function driver will handle the EP queue error accordingly. >>> Simply unmap the request as it was done before, and allow previously >>> started transfers to be cleaned up. >>> > > Hi Thinh, > >> >> It looks like you're still letting dwc3 stopping and cancelling all the >> active requests instead letting the function driver doing the dequeue. >> > > Yeah, main issue isn't due to the function driver doing dequeue, but > having cleanup (ie USB request free) if there is an error during > usb_ep_queue(). > > The function driver in question at the moment is the f_fs driver in AIO > mode. When async IO is enabled in the FFS driver, every time it queues > a packet, it will allocate a io_data struct beforehand. If the > usb_ep_queue() fails it will free this io_data memory. Problem is that, > since the DWC3 gadget calls the completion with -ECONNRESET, the FFS > driver will also schedule a work item (within io_data struct) to handle > the completion. So you end up with a flow like below > > allocate io_data (ffs) > --> usb_ep_queue() > --> __dwc3_gadget_kick_transfer() > --> dwc3_send_gadget_ep_cmd(EINVAL) > --> dwc3_gadget_ep_cleanup_cancelled_requests() > --> dwc3_gadget_giveback(ECONNRESET) > ffs completion callback > queue work item within io_data > --> usb_ep_queue returns EINVAL > ffs frees io_data > ... > > work scheduled > --> NULL pointer/memory fault as io_data is freed sounds like a race issue. > >> BTW, what kinds of command and error do you see in your setup and for >> what type endpoint? I'm thinking of letting the function driver to >> dequeue the requests instead of letting dwc3 automatically >> ending/cancelling the queued requests. However, it's a bit tricky to do >> that if the error is -ETIMEDOUT since we're not sure if the controller >> had already cached the TRBs. >> > > Happens on bulk EPs so far, but I think it wouldn't matter as long as > its over the FFS interface. (and using async IO transfers) Do you know which command and error code? It's strange if UPDATE_TRANSFER command failed. > >> This seems to add more complexity and I don't have a good solution to >> it. Since you're already cancelling all the active request anyway, what >> do you think of always letting dwc3_gadget_ep_queue() to go through with >> success, but report failure through request completion? >> > > We do have something similar as well downstream (returning success > always on dwc3_gadget_ep_queue()) and its been working for us also. > Problem is we don't test the ISOC path much, so this is the only type of > EP that might come into question... > It should be similiar with isoc. I can't think of a potential issue yet. > Coming up with a way to address the concerns you brought up was a bit > difficult as there were scenarios we needed to consider. next_request() > doesn't always have to be the request being queued (even if ep queue > triggered it). There was no easy way to determine if kick transfer was > due to ep queue, but even if there was, we'd need to remember the > previous point as well. > Yeah, there are a few pitfalls. I don't have a good solution to it if we want to return failure immediately and let the function driver handle the dequeue (if it wants to). Thanks, Thinh