Re: Spurious Mass Storage Device Resets

Rian Hunter <rian@xxxxxxxxx> · Fri, 4 Mar 2016 12:31:03 -0800 (PST)

On Tue, 1 Mar 2016, Alan Stern wrote:
On Mon, 29 Feb 2016, Rian Hunter wrote:

Hello,

I own a JBOD SATA<->USB 3.0 bridge. All information about this device
and my kernel version is below.

My trouble is that every so often this device will undergo a virtual
USB disconnect and then be reconnected. This will cause all drives to
disappear and reappear on the system. All previously mounted file
systems will have to be remounted.

Anyway, you can get more information about the resets and their causes
if you collect a usbmon trace.  See Documentation/usb/usbmon.txt for
instructions.

Thanks for the great tip. I used tcpdump on the bus of my device and
waited for a couple days for the effect to happen again.

I found that whenever I would get the "usb 2-3: reset SuperSpeed USB
device number 2 using xhci_hcd" what was happening at the protocol
level was:

HOST: DO READ
DEVICE: CONFIRMED
HOST: SEND ME DATA
DEVICE: <DATA>
HOST: SEND ME STATUS
DEVICE: SCSI CHECK CONDITION

After the "Check Condition" the stack would initial a USB reset.

Now that was all well and fine, no interruptions in service. What
eventually caused the entire device to disconnect was this sequence:

HOST: DO READ
DEVICE: CONFIRMED
HOST: SEND ME DATA
DEVICE: <DATA>
HOST: SEND ME STATUS
(300 seconds pass...)
DEVICE: USB URB ECONNRESET

I believe what is happening here is that the firmware of the bridge
device timed out waiting for the SCSI status coming from the actual
HDD (after enough "check condition" return codes the device decided to
die). The bridge firmware sends a final ECONNRESET then it decides to
disconnect completely.

None of this surprises me. The theory that the drives are bad seems
to have more data behind it.

From "https://en.wikipedia.org/wiki/SCSI_check_condition"; it says that
when a "check condition" status is returned, the device goes into
a special "contigent allegiance condition" state and the host *should*
retrieve more information using a "Request Sense" command. The Linux
stack does not seem to be doing this. While it probably couldn't save
the device, it would be extremely valuable information to see in the
logs, especially when the HDDs are operating normally otherwise. Probably
would be worth it to add a special case in usb_stor_invoke_transport()
to handle the "check condition" error more gracefully.

At this point I'd really like to see what information is contained the
"Request sense" response. Going to hunt for a user-space solution to pull
that for now before I order new drives :)

Rian
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html