Seagate External SMR drive USB resets (was: Re: [PATCH] uas: Add US_FL_NO_ATA_1X quirk for one more Seagate device)

Jérôme Carretero <cJ-ko@xxxxxxxxxxx> · Wed, 15 Nov 2017 16:43:14 -0500

Hi Hans,

Tests are currently undergoing with drives operating in plain USB mass
storage class. In a first time, I'm filling drives with data
(uncontrolled corpus, just TBs that I have on hand). It looks like the
drives with most usage history are the ones that drop most often.

kernel: usb 3-4.1.1: reset SuperSpeed USB device number 6 using xhci_hcd
kernel: usb 3-4.2.1: reset SuperSpeed USB device number 7 using xhci_hcd
kernel: usb 3-4.3.1.1: reset SuperSpeed USB device number 13 using xhci_hcd
kernel: usb 3-4.3.2.1: reset SuperSpeed USB device number 14 using xhci_hcd
kernel: usb 3-4.4: reset SuperSpeed USB device number 8 using xhci_hcd
kernel: usb 6-4.3.2.1: reset SuperSpeed USB device number 8 using xhci_hcd
kernel: usb 6-4.3.3.1: reset SuperSpeed USB device number 9 using xhci_hcd
kernel: usb 6-4.4.1: reset SuperSpeed USB device number 6 using xhci_hcd

Will provide some more interesting/visual data later.

I'm surprised that the message "reset SuperSpeed USB device ..." is
displayed without prior information about why.
Someone with more background could give hints?

I took a look at the USB MSC code and have few questions / observations:

- It looks like (haven't tested it yet) the CONFIG_DYNAMIC_DEBUG isn't
  used with the USB mass storage debugging infrastructure, please
  confirm? If unused, are we interested to have a patch that would go
  back to regular pr_debug() that can work with dynamic debugging?

  Because with several of these drives / lots of activity / occasional
  issues, it looks like it will be hard to catch (yes I can use usbmon).

- It looks like there is no configurable timeout for USB MSC requests.
  Perhaps the device is not responding in time and this is why it's
  reset?

Best regards,

-- 
Jérôme

On Mon, 13 Nov 2017 12:38:14 -0500
Jérôme Carretero <cJ-ko@xxxxxxxxxxx> wrote:

> Hi Hans,
> 
> On Mon, 13 Nov 2017 10:04:53 +0100
> Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
> 
> > Hi,
> > 
> > On 13-11-17 07:14, Jérôme Carretero wrote:  
> > > On Mon, 13 Nov 2017 07:01:30 +0300
> > > Andrey Astafyev <1@xxxxxxxxx> wrote:
> > >     
> > >> 13.11.2017 00:42, Jérôme Carretero пишет:    
> > >>> Nov 12 16:20:59 Bidule kernel: sd 22:0:0:0: [sdaa] tag#2
> > >>> uas_eh_abort_handler 0 uas-tag 3 inflight: CMD OUT
> > >>> [...]
> > >>> Do you see such things?  
> 
> > > For my devices, adding US_FL_NO_ATA_1X to unusual_uas.h didn't
> > > change anything, and while adding US_FL_IGNORE_UAS (using
> > > quirks=0bc2:ab34:u,0bc2:ab38:u) there are still device resets,
> > > but they cause shorter hangs in system activity (~1 second when
> > > UAS was more like ~20).    
> > 
> > The errors you are seeing are write errors. If you're seeing these
> > errors with both the usb-storage and uas drivers then there likely
> > is something wrong with your setup / hardware.  
> 
> My latest drives are Seagate Backup+ Hub 8TB and have ~ 50 hours of
> uptime. I have connected them to different controllers and they do the
> same as the first generation of the same capacity from 2015.
> 
> SMART says that everything is OK on these disks (I have another that
> was RMA'ed and the symptoms of failure are something else), and if
> there were USB errors, the messages wouldn't be at the higher SCSI
> level, I guess I would see "xact failed" USB errors... no?
> 
> > Does the drive in question use an external power-supply or is it
> > USB bus-powered? If it is the latter then that is likely the
> > problem.  
> 
> External power supply & ~2-ft cable provided by Seagate.
> 
> > Anyways things I would check and try to swap are both the cable
> > used, the power-supply used (if any), the USB-port used as well
> > as trying the disk on a completely different computer.  
> 
> I did that. The same thing happens.
> 
> > I've the feeling something is busted with your hardware, it
> > could be the disk itself. Did you mention that this was the first
> > release of a new higher capacity ? Those often have some kinks
> > which are worked out in later revisions.  
> 
> No, that's about the 3rd release I think.
> 
> 
> I really suspect this has to do with GC activity of these SMR drives,
> as if the write activity is throttled or in more spaced bursts (same
> USB-level intensity), then there is no problem.
> 
> I will do longer tests and see if only some of them do that, after
> they have been subjected to similar usage history.
> 
> 
> Best regards,
>