On Sat, Aug 26, 2023 at 01:20:34AM +0000, Thinh Nguyen wrote: > Sorry for the delay response. > > On Wed, Aug 23, 2023, Alan Stern wrote: > > In uas, a -EPROTO error will cause an error status to be returned to the > > SCSI layer, which will invoke the SCSI error handler. After enough > > failures it will call the uas device-reset handler, and > > uas_eh_device_reset_handler() calls usb_reset_device(). > > From my testing with UASP, if I recall correctly, there's a 30 second > timeout before the reset handler kicks in. That timeout length comes from the SCSI error handler. I believe the user can control the length by setting a sysfs attribute. Also, it should be possible to change the uas driver to make it perform a reset on its own right away, the way usb-storage does, without waiting for the SCSI error handler -- if this matters. > From the xHCI spec, it suggests to issue a CLEAR_FEATURE(halt_ep) after > the endpoint reset (e.g. from transaction error). Whether this action > should be taken from the class driver or from the xHCI driver, it's not > clear. However, as you said, without Bulk-Only Mass Storage Reset > request, the host and device will be out of sync. The response action > taken from xHCI is independent from MSC protocol. So it would make sense > for the UDC driver to treat CLEAR_FEATURE(halt_ep) as such and not > expect a Bulk-Only Mass Storage Reset or the equivalent. In USB-2, performing an endpoint reset in the host controller together with sending a Clear-Halt is dangerous. It can lead to data duplication. Here's how that can happen. Suppose the data toggles on both the host and gadget side are initially equal to 0 when a bulk-OUT transaction occur. The host sends a DATA0 packet which the gadget receives, causing the gadget's data toggle to change to 1. But let's say the gadget's ACK handshake gets corrupted, causing a protocol error on the host. So the host does an internal endpoint reset and sends a Clear-Halt to the gadget. When the gadget processes this command, it resets its data toggle back to 0. Now the host resends the same DATA0 packet as before -- and the gadget accepts it the second time because its data toggle is set to 0! The duplicated data leads to corruption. If the gadget's data toggle had remained 1 then it would have acknowledged the duplicate DATA0 packet but otherwise ignored it, avoiding the corruption. I admit the probability of this happening is very low, but a more robust error recovery procedure at the class level would prevent this scenario. > > > The UDC driver needs to notify the gadget driver somehow, cancelling the > > > request and give it back is currently the way dwc3 handling it. > > > > And I'm saying that the UDC driver does _not_ need to notify the gadget > > driver. > > > > The MSC gadget driver works just fine without any such notification. > > Can you name any gadget driver that _does_ need a notification? > > > > We were testing against UASP. The reason I added this was to mimic the > behavior of common vendor UASP devices when it recovers from transaction > errors in Windows. > > When the data sequence of a transfer is reset, the host needs to send > CLEAR_FEATURE(halt_ep). It should be a common behavior. Since it is > common and part of the xHCI spec, do we expect the xHCI to send this or > do we expect the class protocol to document and handle this? At the > moment, I expect it to be the former and expect the UDC driver to treat > it as a common synchronization that perhaps the only synchronization > mechanism the upper protocol depends on. I think it should be the opposite; the class protocol should specify how to recover from errors. If for no other reason then to avoid the data duplication problem for USB-2. However, if it doesn't specify a recovery procedure then there's not much else you can do. But regardless, how can the gadget driver make any use of the knowledge that the UDC received a Clear-Halt? What would it do differently? If the intent is simply to clear an error condition and continue with the existing transfer, the gadget driver doesn't need to do anything. Alternatively, if the procedure for clearing an error condition is given by the class protocol then the protocol should spell out a method for the host to inform the gadget about what it is doing -- something more than just sending a Clear-Halt. Alan Stern