Re: xhci: LPM issues using Western Digital harddrive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 28 Sep 2012, Don Zickus wrote:

> On Fri, Sep 28, 2012 at 11:39:48AM -0400, Alan Stern wrote:
> > On Fri, 28 Sep 2012, Don Zickus wrote:
> > 
> > > > Probably the device was already hung.  The log showed that an earlier 
> > > > transfer completed with a STALL.  The reason wasn't apparent, although 
> > > > a usbmon trace might give a clue.  It might also show why the device 
> > > > had to be reset.
> > > 
> > > Here is the output of 'cat 4u'
> > 
> > > ffff880240121860 3207118259 S Bo:4:002:3 -115 31 = 55534243 6c000000 00000000 00000600 00000000 00000000 00000000 000000
> > > ffff880240121860 3207118359 C Bo:4:002:3 0 31 >
> > > ffff880240121860 3207118430 S Bi:4:002:4 -115 13 <
> > > ffff880240121860 3207118511 C Bi:4:002:4 0 13 = 55534253 6c000000 00000000 00
> > > ffff880240121860 3207127088 S Bo:4:002:3 -115 31 = 55534243 6d000000 00020000 80000ca1 082e0001 00000000 ec000000 000000
> > > ffff880240121860 3207127197 C Bo:4:002:3 0 31 >
> > > ffff880240122698 3207127238 S Bi:4:002:4 -115 512 <
> > > ffff880240122698 3207143925 C Bi:4:002:4 -32 0
> > > ffff880240121860 3207144021 S Co:4:002:0 s 02 01 0000 0084 0000 0
> > > ffff880240121860 3207220654 C Co:4:002:0 -32 0
> > > ffff880240122698 3207220814 S Co:4:001:0 s 23 03 0004 0004 0000 0
> > > ffff880240122698 3207226644 C Co:4:001:0 0 0
> > 
> > This shows that the problem began when the device was sent a command it
> > didn't recognize: 0xA1, which is a 12-byte ATA pass-through, in this
> > case for an IDENTIFY DEVICE command (0xEC).  Presumably the Western
> > Digital device doesn't support ATA pass-through.  The device halted its
> > bulk-IN endpoint and then replied with a STALL to the
> > Clear-Endpoint-Halt request (which is an invalid response).  This is
> > why the reset was tried.
> > 
> > Don, what does a comparable usbmon trace look like when the drive is 
> > attached to an EHCI controller?  I would expect to see basically the 
> > same thing.
> 
> I attached the output of the ehci and xhci of another successful
> controller.

Here are the corresponding parts of these two traces.  They are
essentially the same:

> cat 1u #ehci

> ffff880240121248 1870202583 S Bo:1:004:3 -115 31 = 55534243 7d000000 00000000 00000600 00000000 00000000 00000000 000000
> ffff880240121248 1870202784 C Bo:1:004:3 0 31 >
> ffff880240121248 1870202866 S Bi:1:004:4 -115 13 <
> ffff880240121248 1870203003 C Bi:1:004:4 0 13 = 55534253 7d000000 00000000 00
> ffff880240121248 1870210619 S Bo:1:004:3 -115 31 = 55534243 7e000000 00020000 80000ca1 082e0001 00000000 ec000000 000000
> ffff880240121248 1870210756 C Bo:1:004:3 0 31 >
> ffff880240121450 1870210839 S Bi:1:004:4 -115 512 <
> ffff880240121450 1870211037 C Bi:1:004:4 -32 0
> ffff880240121248 1870211144 S Co:1:004:0 s 02 01 0000 0084 0000 0
> ffff880240121248 1870211386 C Co:1:004:0 0 0
> ffff880240121248 1870211434 S Bi:1:004:4 -115 13 <
> ffff880240121248 1870211644 C Bi:1:004:4 0 13 = 55534253 7e000000 00020000 01

> cat 8u #NEC xhci controller

> ffff880240121658 1786328000 S Bo:8:003:3 -115 31 = 55534243 7d000000 00000000 00000600 00000000 00000000 00000000 000000
> ffff880240121658 1786328121 C Bo:8:003:3 0 31 >
> ffff880240121658 1786328169 S Bi:8:003:4 -115 13 <
> ffff880240121658 1786328272 C Bi:8:003:4 0 13 = 55534253 7d000000 00000000 00
> ffff880240121658 1786337748 S Bo:8:003:3 -115 31 = 55534243 7e000000 00020000 80000ca1 082e0001 00000000 ec000000 000000
> ffff880240121658 1786337868 C Bo:8:003:3 0 31 >
> ffff880240121248 1786337966 S Bi:8:003:4 -115 512 <
> ffff880240121248 1786341630 C Bi:8:003:4 -32 0
> ffff880240121658 1786341731 S Co:8:003:0 s 02 01 0000 0084 0000 0
> ffff880240121658 1786341990 C Co:8:003:0 0 0
> ffff880240121658 1786427962 S Bi:8:003:4 -115 13 <
> ffff880240121658 1786444081 C Bi:8:003:4 0 13 = 55534253 7e000000 00020000 01

For both of these, the Clear-Endpoint-Halt request succeeded.

There was another significant difference.  The Get-Max-Lun command 
timed out and had to be cancelled on the original controller:

> ffff880240121860 3196779908 S Ci:4:002:0 s a1 fe 0000 0000 0001 1 <
> ffff880240121860 3206883050 C Ci:4:002:0 -2 0

But it succeeded on the EHCI and NEC controllers:

> ffff880240121248 1866407704 S Ci:1:004:0 s a1 fe 0000 0000 0001 1 <
> ffff880240121248 1866407862 C Ci:1:004:0 0 1 = 01

> ffff880240121658 1782799679 S Ci:8:003:0 s a1 fe 0000 0000 0001 1 <
> ffff880240121658 1782800135 C Ci:8:003:0 0 1 = 01

I don't know what the implication is.  Perhaps these failures are 
related to LPM and perhaps not.  The easiest way to find out would be 
to disable LPM in the kernel and then try the original controller 
again.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux