On Fri, 4 Mar 2016, Rian Hunter wrote: > Thanks for the great tip. I used tcpdump on the bus of my device and > waited for a couple days for the effect to happen again. > > I found that whenever I would get the "usb 2-3: reset SuperSpeed USB > device number 2 using xhci_hcd" what was happening at the protocol > level was: > > HOST: DO READ > DEVICE: CONFIRMED > HOST: SEND ME DATA > DEVICE: <DATA> > HOST: SEND ME STATUS > DEVICE: SCSI CHECK CONDITION > > After the "Check Condition" the stack would initial a USB reset. More details would help, such as the actual usbmon output for one of those failed commands (plus the following data). > Now that was all well and fine, no interruptions in service. What > eventually caused the entire device to disconnect was this sequence: > > HOST: DO READ > DEVICE: CONFIRMED > HOST: SEND ME DATA > DEVICE: <DATA> > HOST: SEND ME STATUS > (300 seconds pass...) > DEVICE: USB URB ECONNRESET > > I believe what is happening here is that the firmware of the bridge > device timed out waiting for the SCSI status coming from the actual > HDD (after enough "check condition" return codes the device decided to > die). The bridge firmware sends a final ECONNRESET then it decides to > disconnect completely. No, that doesn't sound right. SCSI commands typically have a 30-second timeout, so there should have been a reset after 30 seconds, not a disconnect after 300. > None of this surprises me. The theory that the drives are bad seems > to have more data behind it. > > From "https://en.wikipedia.org/wiki/SCSI_check_condition" it says that > when a "check condition" status is returned, the device goes into > a special "contigent allegiance condition" state and the host *should* > retrieve more information using a "Request Sense" command. The Linux > stack does not seem to be doing this. Not true. It does do this, very faithfully. > While it probably couldn't save > the device, it would be extremely valuable information to see in the > logs, especially when the HDDs are operating normally otherwise. Probably > would be worth it to add a special case in usb_stor_invoke_transport() > to handle the "check condition" error more gracefully. That code is already in there. > At this point I'd really like to see what information is contained the > "Request sense" response. Going to hunt for a user-space solution to pull > that for now before I order new drives :) It should all be in your usbmon output. If you would post that here (just the relevant portions, not the whole several-day-long trace), I could tell you more. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html