Hi, Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes: >>>>>> allocate io_data (ffs) >>>>>> --> usb_ep_queue() >>>>>> --> __dwc3_gadget_kick_transfer() >>>>>> --> dwc3_send_gadget_ep_cmd(EINVAL) >>>>>> --> dwc3_gadget_ep_cleanup_cancelled_requests() >>>>>> --> dwc3_gadget_giveback(ECONNRESET) >>>>>> ffs completion callback >>>>>> queue work item within io_data >>>>>> --> usb_ep_queue returns EINVAL >>>>>> ffs frees io_data >>>>>> ... >>>>>> >>>>>> work scheduled >>>>>> --> NULL pointer/memory fault as io_data is freed >>>> >>>> Hi Thinh, >>>> >>>>> >>>>> sounds like a race issue. >>>>> >>>> >>>> It'll always happen if usb_ep_queue() fails with an error. Sorry for not >>>> clarifying, but the "..." represents executing in a different context >>>> :). Anything above the "..." is in the same context. >>>>>> >>>>>>> BTW, what kinds of command and error do you see in your setup and for >>>>>>> what type endpoint? I'm thinking of letting the function driver to >>>>>>> dequeue the requests instead of letting dwc3 automatically >>>>>>> ending/cancelling the queued requests. However, it's a bit tricky to do >>>>>>> that if the error is -ETIMEDOUT since we're not sure if the controller >>>>>>> had already cached the TRBs. >>>>>>> >>>>>> >>>>>> Happens on bulk EPs so far, but I think it wouldn't matter as long as >>>>>> its over the FFS interface. (and using async IO transfers) >>>>> >>>>> Do you know which command and error code? It's strange if >>>>> UPDATE_TRANSFER command failed. >>>>> >>>> >>>> Sorry for missing that part of the question. It is a no xfer resource >>>> error on a start transfer command. So far this happens on low system >>>> memory test cases, so there may be some sequences that were missed, >>>> which led to this particular command error. >>>> >>>> Thanks >>>> Wesley Cheng >> >> Hi Thinh, >> >>> >>> No xfer resource usually means that the driver attempted to send >>> START_TRANSFER without waiting for END_TRANSFER command to complete. >>> This may be a dwc3 driver issue. Did you check this? >>> >>> Thanks, >>> Thinh >>> >>> >> >> Yes, we know the reason why this happens, and its due to one of the >> downstream changes we had that led to the scenario above. Although, >> that has been fixed, I still believe the error path is a potential >> scenario we'd still want to address. >> >> I think the returning success always on dwc3_gadget_ep_queue(), and >> allowing the error in the completion handler/giveback at the function >> driver level to do the cleanup is a feasible solution. Doesn't change >> the flow of the DWC3 gadget, and so far all function drivers we've used >> handle this in the correct manner. >> >> Thanks >> Wesley Cheng > > Right. I think for now we should do that (return success always except > for cases of disconnect or already in-flight etc). This helps keeping it no, let's not lie to our users ;-) > simple and avoid some pitfalls dealing with giving back the request. > Currently we return the error status to dwc3_gadget_ep_queue if we > failed to send a command that may not even related to the same request > being queued. I think the fix should be simple, but we're trying to patch it in the wrong way. Can y'all comment on my suggestion on the other subthread? -- balbi
Attachment:
signature.asc
Description: PGP signature