On Thu, Aug 31, 2023, Alan Stern wrote: > On Thu, Aug 31, 2023 at 02:43:51AM +0000, Thinh Nguyen wrote: > > On Wed, Aug 30, 2023, Alan Stern wrote: > > > On Wed, Aug 30, 2023 at 01:32:28AM +0000, Thinh Nguyen wrote: > > > > That reminds me another thing, if the host (xhci in this case) does a > > > > hard reset to the endpoint, it also resets the TRB pointer with dequeue > > > > ep command. So, the transfer should not resume. It needs to be > > > > cancelled. This xHCI behavior is the same for Windows and Linux. > > > > > > That's on the host side, right? How does this affect the gadget side? > > > > > > That is, cancelling a transfer on the host doesn't necessarily mean it > > > has to be cancelled on the gadget. Does it have any implications at all > > > for the gadget driver? > > > > There are 2 things that needs to be in sync'ed between host and device: > > 1) The data sequence. > > You mean the USB-3 sequence number value? Yes. > > > 2) The transfer. > > > > If host doesn't send CLEAR_FEATURE(halt_ep), best case scenario, the > > data sequence does't match and the host issues usb reset after some > > timeout because the packet won't go through. > > The data toggles in USB-2, which are analogous to the sequence numbers > in USB-3, don't work the same way. When a USB-2 controller receives a > data packet with the wrong sequence number, it sends an ACK response but > otherwise ignores it. This prevents timeouts (but not other types of > errors). > > > Worst case scenario, the > > data sequence matches 0, and the wrong data is received causing > > corruption. > > > > If the device doesn't cancel the transfer in response to > > CLEAR_FEATURE(halt_ep), it may send/receive data of a different transfer > > because the host doesn't resume where it left off, causing corruption. > > > > Base on the class protocol, the class driver and gadget driver know > > what makes up a "transfer" and can appropriately cancel a transfer to > > stay in sync. > > You're still thinking of UAS in particular, right? What I would expect > to happen when there's a transaction error in a UAS data transfer, based > on reading the UAS spec, is that the host would cancel the transfer on > its side and send either an Abort Task or an I_T Nexus Reset task > management request to the device (in addition to resetting the host > endpoint and sending a Clear-Halt). I would not expect the host to hope > that the device would abandon the transfer merely because it got the > Clear-Halt. > > Does Windows really work this way? Does it not send a task management > request? That would definitely seem to be against the intent of the > spec, if not against the letter. Unfortunately yes, I don't see any Task Management request aborting the transfer. > > > > How does the gadget driver sync with the host if the class protocol > > > doesn't say what should be done? > > > > > > Also, what if there is no active transfer? That is, what if the > > > transaction that got an error on the host appeared to be successful on > > > the gadget and it was the last transaction in the final transfer queued > > > for the endpoint? How would the UDC driver notify the gadget driver in > > > this situation? > > > > That's fine. If there's no active transfer, the gadget doesn't need to > > cancel anything. As long as the host knows that the transfer did not > > complete, it can retry and be in sync. For UASP, the host will send a > > new MSC command to retry the failed transfer. ie. The host would > > overwrite/re-read the transfer with the same transfer offset. > > > > The problem arises if the gadget attempts to resume the incomplete > > transfer. > > Quite so. But would the host send a new MSC retry command before the > failed command completes? The host sends a new MSC command after the incomplete command failed. > > > > > This is observed in > > > > UASP driver in Windows and how various consumer UASP devices handle it. > > > > > > I don't understand what you're saying here. How can you observe whether > > > a transfer is cancelled in a consumer UAS device? And how does the > > > consumer device resync with the host? > > > > You can see a hang if the transfer are out of sync. If the transfer > > isn't cancelled, the device would only source/sink whatever the > > remaining of the previous transfer but not enough to complete the new > > transfer. The new transfer is seen as incomplete from host and thus the > > hang and the usb reset. > > > > > > > > > There no eqivalent of Bulk-Only Mass Storage Reset request from the > > > > class protocol. We still have the USB analyzer traces for this. > > > > > > Can you post an example? Not necessarily in complete detail, but enough > > > so that we can see what's going on. > > > > > > > Regardless whether the class protocol spells out how to handle the > > > > transaction error, if there's transaction error, the host may send > > > > CLEAR_FEATURE(halt_ep) as observed in Windows. The gadget driver needs > > > > to know about it to cancel the active transfer and resync with the host. > > > > > > I'll be able to understand this better after seeing an example. Do you > > > have any traces that were made for a High-speed connection (say, using > > > a USB-2 cable)? It would probably be easier to follow than a SuperSpeed > > > example. > > > > > > > Unfortunately I only have LeCroy usb analyzer traces of Gen 2x1, not for > > usb2 speed. It's a bit tricky converting it to text with all the proper > > info to see all the context. If my explanation isn't clear, I'll try to > > figure out how to proceed. > > I would appreciate seeing whatever you can provide. > Here's a snippet captured at the SCSI level from Samsung T7 device response to CLEAR_FEATURE(halt-ep) to IN data endpoint from host (Windows 10). Similar behavior is observed for OUT endpoint. _______|_______________________________________________________________________ SCSI Op(80) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928E800) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.335 ms) Time Stamp(10 . 000 538 006) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(81) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928EC00) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.318 ms) Time Stamp(10 . 001 872 988) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(82) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928F000) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.343 ms) Time Stamp(10 . 003 191 188) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(83) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928F400) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.256 ms) Time Stamp(10 . 004 534 630) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(84) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928F800) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.178 ms) Time Stamp(10 . 005 791 128) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(85) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928FC00) Data(146432 bytes) Status(Missing)-BAD _______| Time( 2.681 ms) Time Stamp(10 . 006 968 662) Metrics #Xfers(2) _______|_______________________________________________________________________ ## Transaction eror occurs here. Transfer(289) Left("Left") G2(x1) Control(SET) ADDR(3) ENDP(0) _______| bRequest(CLEAR_FEATURE) wValue(ENDPOINT_HALT) wLength(0) _______| Time(166.322 us) Time Stamp(10 . 009 649 516) _______|_______________________________________________________________________ ## CLEAR_FEATURE happens here. SCSI Op(99) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09290000) RESPONSE_CODE(OVERLAPPED TAG) _______| Time(365.854 us) Time Stamp(10 . 009 815 838) Metrics #Xfers(2) _______|_______________________________________________________________________ SCSI Op(100) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09290400) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.012 sec) Time Stamp(10 . 010 181 692) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(101) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x0928FC00) STATUS(GOOD) Data(524288 bytes) _______| Time(882.412 us) Time Stamp(11 . 022 469 104) Metrics #Xfers(3) _______|_______________________________________________________________________ ## Host retries transfer here. Check logical block address. SCSI Op(102) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09290000) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.060 ms) Time Stamp(11 . 023 351 516) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(103) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09290800) STATUS(GOOD) Data(524288 bytes) _______| Time( 1.013 ms) Time Stamp(11 . 024 411 510) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(104) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09290C00) STATUS(GOOD) Data(524288 bytes) _______| Time(816.594 us) Time Stamp(11 . 025 424 600) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(105) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09291000) STATUS(GOOD) Data(524288 bytes) _______| Time(762.286 us) Time Stamp(11 . 026 241 194) Metrics #Xfers(3) _______|_______________________________________________________________________ SCSI Op(106) ADDR(3) Tag(0x0002) SCSI CDB READ(10) _______| Logical Block Addr(0x09291400) STATUS(GOOD) Data(524288 bytes) _______| Time(768.696 us) Time Stamp(11 . 027 003 480) Metrics #Xfers(3) _______|_______________________________________________________________________ BR, Thinh