On Tue, 4 Dec 2018, Anurag Kumar Vulisha wrote: > >> Yes the host can cancel the transfer. This issue originated from the endpoints using > >bulk > >> streaming protocol and may not occur with normal endpoints. AFAIK bulk streaming > >is > >> gadget driven, where the gadget is allowed to select which stream id transfer the > >host > >> should work on . Since the host doesn't have control on when the transfer would be > >> selected by gadget, it may wait for longer timeouts before cancelling the transfer. > > > >You're missing the point. Although the device selects which stream ID > >gets transferred, the _host_ decides whether a stream transfer should > >occur in the first place. No matter how many ERDY packets the device > >controller tries to send, no transfer will occur until the host wants > >to do it. > > > >In this sense, stream transfers (like all other USB interactions except > >wakeup requests) are host-driven. > > > > I agree with you that without host willing to transfer, the transfer wouldn't have > started and also agree the host always has the capability of detecting the hang. But > we are not sure on what host platform and operating system the gadget would be > tested and also not sure whether the host software is capable of handling the deadlock. > Even if the host has a timer , it would be very long and waiting for the timer to get > timedout would reduce the performance. In this patch we are giving the user the > flexibility to opt the timeout value, so that the deadlock can be avoided without > effecting the performance. So I think adding the timer in gadget is more easier solution > than handling in host. I have seen this workaround working for both linux & windows. Is there any way for the device controller driver to detect the problem without relying on a timeout? In fact, is this problem a known bug in the existing device controller hardware? Have the controller manufacturers published a suggested scheme for working around it? > >> >> Since the gadget > >> >> controller driver is aware that the controller is stuck , makes it responsible > >> >> to recover the controller from hang condition by restarting the transfer (which > >> >> triggers the controller FSM to issue ERDY to host). > >> > > >> >Isn't there a cleaner way to recover than by cancelling the request and > >> >resubmitting it? > >> > > >> > >> dequeuing the request issues the stop transfer command to the controller, which > >> cleans all the hardware resource allocated for that endpoint. This also resets the > >> hardware FSMs for that endpoint . So, when re-queuing of the transfer happens > >> the controller starts allocating hardware resources again, thus avoiding the > >probability > >> of entering into the issue. I am not sure of other controllers, but for dwc3, issuing > >> the stop transfer is the only way to handle this issue. > > > >Again you're missing the point. Can't the controller driver issue the > >Stop Transfer command but still consider the request to be in progress > >(i.e., don't dequeue the request) so that the gadget driver's > >completion callback isn't invoked and the request does not need to be > >explicitly resubmitted? > > > > Yes , what you are saying is correct. My initial patches were following the > same approach that you have suggested. We switched to the dequeue > approach because there can be different gadgets which may enter into > this issue and it would be better to follow a generic approach rather than > having every controller driver implementing their own workaround. > Please find the conservation in this link https://patchwork.kernel.org/patch/10640145/ At this point, it seems that the generic approach will be messier than having every controller driver implement its own fix. At least, that's how it appears to me. > >> @Felipe: Can you please provide your suggestion on this. > > > >> >How can the gadget driver know what timeout to use? The host is > >> >allowed to be as slow as it wants; the gadget driver doesn't have any > >> >way to tell when the host wants to start the transfer. > >> > >> Yes , I agree with you that the timeout may vary from usage to usage. This timeout > >> should be decided by the class driver which queues the request. As discussed above > >> this issue was observed in streaming protocol , which is very much faster than > >normal > >> BOT modes and it works on super speed . > > > >Although USB mass storage is currently the only user of the stream > >protocol, that may not be true in the future. You should think in more > >general terms. A timeout which is appropriate for mass storage may not > >be appropriate for some other application. > > > > You are correct , this timeout may not be ideal. Since I was not able to reproduce this issue > on non-stream capable transfers , at present the only user of usb_ep_queue_timeout() API > is the streaming gadget. As we are not aware of the optimal timeout value, instead of > hardcoding the timeout value we can add module_param() for it. So that the user can configure > timeout based on their use case. Please let me know your suggestion on this. Ideally it would not be necessary to rely on a timeout at all. Also, maintainers dislike module parameters. It would be better not to add one. Alan Stern