Re: dwc3: unusual handling of setup requests with wLength == 0

Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> · Tue, 5 Sep 2023 22:53:15 +0000

On Sat, Sep 02, 2023, Alan Stern wrote:
> > # Host completely dropped the SCSI command with transaction error. It
> > # doesn't request for the status. Since it's dropped, tag 2 is
> > # available. Now, a new SCSI command can use Tag 2.
> 
> That is what this looks like.  I'm not aware of any place in the UAS 
> spec where it says that a host is allowed to do this.
> 
> It would be very interesting to know at this point if the host tried to 
> read more data on the data endpoint during the 160-microsecond period 
> before transfer 292.  Does the full log show whether this happened?  
> Does the fact that the transfers have consecutive numbers mean that 
> nothing happened on the bus during this time?  Was the device waiting 
> for the host before sending more data, or did it cancel the data that 
> was already queued?
> 

Here's the packet level view (w/o link commands and misc packets):

_______|_______________________________________________________________________L 
Packet(302161) Left("Left") Dir G2(x1) TP Status(4) ADDR(3) ENDP(0) 
_______| Dir(--) PP(Not Pnd)   LCW  (Hseq:9) Duration(16.540 ns) 
_______| Time(772.000 ns) Time Stamp(10 . 009 669 310) 
_______|_______________________________________________________________________R 
Packet(302169) Right("Right") Dir G2(x1) TP ACK(1) ADDR(3) ENDP(0) 
_______| TT(Control) Dir(--) SeqN(1) NumP(0) Stream ID(0x0000) PP(Not Pnd) 
_______|   LCW  (Hseq:6) Warning Duration(16.540 ns) Time(145.756 us) 
_______| Time Stamp(10 . 009 670 082) 
_______|_______________________________________________________________________L 

## After the device ACK'ed the status stage. Nothing happened until the
## host sends a new command.

Packet(303013) Left("Left") Dir G2(x1) DP Data Len(32) ADDR(3) ENDP(4) 
_______| TT(Bulk) Dir(Out) SeqN(5) EoB(N) Stream ID(0x0000) PP(Not Pnd) 
_______|   LCW  (Hseq:11) Data(32 bytes) Duration(56.240 ns) Time(  1.390 us) 
_______| Time Stamp(10 . 009 815 838) 
_______|_______________________________________________________________________

> > 
> > # Thinking that the previous Tag 2 command was still active and
> > # responded with an OVERLAPPED TAG. This probably causes the gadget to
> > # cancel the transfer and drop the command so it can be in sync again.
> 
> Indeed.  If the Clear-Halt had caused the device to abort the ongoing 
> transfer, would it have responded this way?  Wouldn't it have thought 
> that the previous Tag 2 command was completely finished?  Or would it 
> have cancelled just the data part of the transfer, while keeping track 
> of the fact that the status part still needed to be sent?
> 
> To put it another way, it seems that the OVERLAPPED TAG response was 
> caused by the fact that the status was never sent.  So whether or not 
> the device responded to the Clear-Halt by cancelling anything, it 
> became aware that something was wrong when it received this duplicate 
> tag.
> 
> > 
> > So, for this scenario, the host was probably stay in sync with the
> > device due to the overlap command tag id canceling the transfer. I'm not
> > sure if this is the intent of the Windows UASP class driver, which seems
> > like a non-conventional way of synchronization. Perhaps it was done
> > because some devices may not support TASK_MANAGEMENT(Abort_task)?
> 
> That's possible.  Windows seems to use a non-conventional approach to 
> many things.
> 
> > Regardless, if the host resets the endpoint, the transfer must be
> > canceled otherwise we risk data corruption.
> 
> Do we?  What would have happened if the transfer had not been 
> cancelled?  The device might have sent some data left over from the 
> original command and the host might have misinterpreted it as belong to 
> the new command.  But when the device sent the OVERLAPPED TAG response, 
> the host would have realized that any data it received for the 
> new command was invalid and abandoned the command.
> 

If the host resets the endpoint, the transfer must be canceled. How
cancelation is triggered is another thing. In this case, it may not
be triggered by the CLEAR_FEATURE(halt_ep), but probably through the
overlap tag detection.

> In fact, that's what it did do in transfer 295.  As far as I can tell 
> from the log, the Clear-Halt didn't cause the device to fully cancel 
> the transfer.

> 
> > Also whenever there's a OVERLAPPED tag error, Windows host takes a long
> > time (~1 sec) to send a new command (check delta time of Transfer 295
> > and 313). If the gadget driver can base off of the
> > CLEAR_FEATURE(halt_ep), this improves performance.
> 
> Okay, that's a good point.
> 
> However, since not cancelling the transfer apparently did not lead to 
> corruption, I think it's okay to allow UDC drivers not to cancel 
> requests when they receive a Clear-Halt.  This decision could be left 
> up to the driver.
> 

Ok, fair enough.

For our tests, we based the transaction errors through the
CLEAR_FEATURE(halt_ep) and dropped the MSC command altogether to stay in
sync, just as how the Windows host drops the MSC command.

But as through this long discussion, we may not need and probably can
avoid forcing the gadget driver to learn a new error code to stay in
sync with Windows's host.

Thanks,
Thinh