Re: Spurious Mass Storage Device Resets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 4 Mar 2016, Rian Hunter wrote:

> On Fri, 4 Mar 2016, Alan Stern wrote:
> > On Fri, 4 Mar 2016, Rian Hunter wrote:
> >
> >> Thanks for the great tip. I used tcpdump on the bus of my device and
> >> waited for a couple days for the effect to happen again.
> >>
> >> I found that whenever I would get the "usb 2-3: reset SuperSpeed USB
> >> device number 2 using xhci_hcd" what was happening at the protocol
> >> level was:
> >>
> >> HOST: DO READ
> >> DEVICE: CONFIRMED
> >> HOST: SEND ME DATA
> >> DEVICE: <DATA>
> >> HOST: SEND ME STATUS
> >> DEVICE: SCSI CHECK CONDITION
> >>
> >> After the "Check Condition" the stack would initial a USB reset.
> >
> > More details would help, such as the actual usbmon output for one of
> > those failed commands (plus the following data).
> >
> 
> This sequence is attached as "spurious_reset.pcap"

Here is the relevant portion, translated into the usbmon text format:

        4166c880 0.280079 S Bo:2:003:2 -115 31 = 55534243 7c333100 00e00000 80020a28 001ef9d1 00000070 00000000 000000
        4166c880 0.280123 C Bo:2:003:2 0 31 >

Send a READ(10) command for 57344 bytes (112 blocks, assuming the disk 
uses 512-byte blocks).

        eba42640 0.280149 S Bi:2:003:1 -115 57344 <
        eba42640 0.293002 C Bi:2:003:1 -32 49152 = c50636e3 28bff5e6 62a13394 30a2b1b8 76265667 88e30a14 81949db7 3e6ecc7b

The device sends back only 49152 bytes of data, followed by a STALL.

        4166c880 0.293096 S Co:2:003:0 s 02 01 0000 0081 0000 0
        4166c880 0.293145 C Co:2:003:0 0 0

Clear the halt condition.

        4166c880 0.293164 S Bi:2:003:1 -115 13 <
        4166c880 0.293217 C Bi:2:003:1 0 13 = 55534253 7c333100 27b50312 3d

Receive the status.  The response is not meaningful; dCSWDataResidue
and bCSWStatus are both garbage.  In particular, since status is not 1,
the device did _not_ report Check Condition.  At this point there is
little that usb-storage can do other than a reset.

> >> Now that was all well and fine, no interruptions in service. What
> >> eventually caused the entire device to disconnect was this sequence:
> >>
> >> HOST: DO READ
> >> DEVICE: CONFIRMED
> >> HOST: SEND ME DATA
> >> DEVICE: <DATA>
> >> HOST: SEND ME STATUS
> >> (300 seconds pass...)
> >> DEVICE: USB URB ECONNRESET
> >>
> >> I believe what is happening here is that the firmware of the bridge
> >> device timed out waiting for the SCSI status coming from the actual
> >> HDD (after enough "check condition" return codes the device decided to
> >> die). The bridge firmware sends a final ECONNRESET then it decides to
> >> disconnect completely.
> >
> > No, that doesn't sound right.  SCSI commands typically have a 30-second
> > timeout, so there should have been a reset after 30 seconds, not a
> > disconnect after 300.
> >
> 
> This sequence is attached as "disconnect.pcap" Note that before all of
> this I had changed the command timeout of the block device using the
> equivalent of:
> 
> # echo 300 > /sys/block/sdb/device/timeout

Okay, that explains the long delay.  Incidentally, the ECONNRESET did 
not come from the bridge.  It came from usb-storage, when the transfer 
was aborted.

> >> From "https://en.wikipedia.org/wiki/SCSI_check_condition"; it says that
> >> when a "check condition" status is returned, the device goes into
> >> a special "contigent allegiance condition" state and the host *should*
> >> retrieve more information using a "Request Sense" command. The Linux
> >> stack does not seem to be doing this.
> >
> > Not true.  It does do this, very faithfully.
> >
> 
> Ah yes, you're right, now that I'm actually looking at the
> code. Though, I'm not sure if the transport layer is returning
> "USB_STOR_TRANSPORT_FAILED" or "USB_STOR_TRANSPORT_ERROR." From the
> "spurious_reset.pcap" capture, as you'll see, no REQUEST_SENSE is
> being sent.

In that trace, the return code would have been 
USB_STOR_TRANSPORT_ERROR.  usb-storage did not send a REQUEST SENSE 
command because the bridge did not send Check Condition status.

> Excited to see what you'll be able to glean from the captures, Thanks
> for your help!

You're welcome.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux