Re: uas driver causes errors when connecting devices to USB3

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Tue, 6 Dec 2011 12:07:20 -0800

On Tue, Dec 06, 2011 at 01:59:00PM -0500, Chuck Ebbert wrote:
> This is still happening on kernel 3.1.1:

Hi Chuck,

The UAS driver error handling is not really up to snuff yet, but I have
been working on it for the past two weeks.  It's not surprising the
driver crashed and burned after a USB transfer error.  The babble could
have been caused by the device, but it also could have been caused by
a bad cable as well.  Have you tried swapping the cable or UAS device?

> Jun 29 09:51:40 karryall kernel: [  270.798057] usb 3-1: new SuperSpeed USB device using xhci_hcd and address 2
> Jun 29 09:51:40 karryall kernel: [  270.835905] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall kernel: [  270.841901] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall kernel: [  270.847527] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall kernel: [  270.853401] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall kernel: [  270.853534] usb 3-1: New USB device found, idVendor=059b, idProduct=0070
> Jun 29 09:51:40 karryall kernel: [  270.853539] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> Jun 29 09:51:40 karryall kernel: [  270.853544] usb 3-1: Product: eGo USB
> Jun 29 09:51:40 karryall kernel: [  270.853547] usb 3-1: Manufacturer: Iomega
> Jun 29 09:51:40 karryall kernel: [  270.853549] usb 3-1: SerialNumber: 0800000000009812
> Jun 29 09:51:40 karryall kernel: [  270.889028] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall kernel: [  270.895403] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall mtp-probe: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:0a.0/0000:02:00.0/0000:03:01.0/0000:04:00.0/usb3/3-1"
> Jun 29 09:51:40 karryall mtp-probe: bus: 3, device: 2 was not an MTP device
> Jun 29 09:51:40 karryall kernel: [  271.033532] xhci_hcd 0000:04:00.0: WARN no SS endpoint bMaxBurst
> Jun 29 09:51:40 karryall kernel: [  271.080278] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:51:40 karryall kernel: [  271.082431] scsi10 : uas
> Jun 29 09:51:40 karryall kernel: [  271.083625] usbcore: registered new interface driver uas
> Jun 29 09:51:40 karryall kernel: [  271.090208] scsi 10:0:0:0: Direct-Access     OEM      Ext Hard Disk    0000 PQ: 0 ANSI: 5
> Jun 29 09:51:40 karryall kernel: [  271.099802] Initializing USB Mass Storage driver...
> Jun 29 09:51:40 karryall kernel: [  271.100444] usbcore: registered new interface driver usb-storage
> Jun 29 09:51:40 karryall kernel: [  271.100449] USB Mass Storage support registered.
> Jun 29 09:51:41 karryall kernel: [  271.199930] sd 10:0:0:0: Attached scsi generic sg3 type 0
> Jun 29 09:51:41 karryall kernel: [  271.299047] sd 10:0:0:0: [sdc] 975319088 512-byte logical blocks: (499 GB/465 GiB)
> Jun 29 09:51:41 karryall kernel: [  271.447444] sd 10:0:0:0: [sdc] Write Protect is off
> Jun 29 09:51:41 karryall kernel: [  271.493157] sd 10:0:0:0: [sdc] Cache data unavailable
> Jun 29 09:51:41 karryall kernel: [  271.493161] sd 10:0:0:0: [sdc] Assuming drive cache: write through
> Jun 29 09:51:41 karryall kernel: [  271.548736] xhci_hcd 0000:04:00.0: WARN: babble error on endpoint
> Jun 29 09:51:41 karryall kernel: [  271.548872] xhci_hcd 0000:04:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
> Jun 29 09:51:41 karryall kernel: [  271.548920] xhci_hcd 0000:04:00.0: ERROR Transfer event for disabled endpoint or incorrect stream ring

The last two lines are a bit odd though.  They indicate the xHCI host
didn't like the xHCI driver trying to move the dequeue pointer past the
transfer that caused the babble.  That shouldn't have been an issue.
Getting a transfer event for an incorrect stream ring is also troubling.

What xHCI host are you plugging the UAS device into?  Can you send the
lspci -vvv output?

> Jun 29 09:52:11 karryall kernel: [  301.792042] sd 10:0:0:0: uas_eh_abort_handler tag 0
> Jun 29 09:52:11 karryall kernel: [  301.792052] sd 10:0:0:0: uas_eh_device_reset_handler tag 0
> Jun 29 09:52:11 karryall kernel: [  301.792057] sd 10:0:0:0: uas_eh_target_reset_handler tag 0
> Jun 29 09:52:11 karryall kernel: [  301.792061] sd 10:0:0:0: uas_eh_bus_reset_handler tag 0
> Jun 29 09:52:11 karryall kernel: [  301.903398] usb 3-1: reset SuperSpeed USB device using xhci_hcd and address 2
> Jun 29 09:52:11 karryall kernel: [  301.979613] xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> Jun 29 09:52:11 karryall kernel: [  301.979745] xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880113706d00
> Jun 29 09:52:11 karryall kernel: [  301.979751] xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880113706d40
> Jun 29 09:52:11 karryall kernel: [  301.979756] xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880113706d80
> Jun 29 09:52:11 karryall kernel: [  301.979760] xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880113706dc0
> Jun 29 09:52:11 karryall kernel: [  302.094253] xhci_hcd 0000:04:00.0: WARN no SS endpoint bMaxBurst

That's odd.  Does your Fedora kernel have this patch in it?

commit d23336329fa4c157ed6256d4279a73b87486a1b6
Author: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
Date:   Mon Jun 6 00:53:47 2011 -0700

    xhci: Don't warn about zeroed bMaxBurst descriptor field.
    
    The USB 3.0 specification says that the bMaxBurst field in the SuperSpeed
    Endpoint Companion descriptor is supposed to indicate how many packets a
    SS device can handle before it needs to wait for an explicit handshake
    from the host controller.  A zero value means the device can only handle
    one packet before it needs a handshake.  Remove a warning in the xHCI
    driver that implies this is an invalid value.
    
    Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 0f8e1d2..fcb7f7e 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1215,8 +1215,6 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
                ep_ctx->ep_info2 |= cpu_to_le32(MAX_PACKET(max_packet));
                /* dig out max burst from ep companion desc */
                max_packet = ep->ss_ep_comp.bMaxBurst;
-               if (!max_packet)
-                       xhci_warn(xhci, "WARN no SS endpoint bMaxBurst\n");
                ep_ctx->ep_info2 |= cpu_to_le32(MAX_BURST(max_packet));
                break;
        case USB_SPEED_HIGH:

It wasn't marked for stable, but you might want to have apply it to
avoid bug reports about a warning that isn't useful.

> Jun 29 09:52:12 karryall kernel: [  302.210258] usb 3-1: URB BAD STATUS -108
> Jun 29 09:52:12 karryall kernel: [  302.210276] sd 10:0:0:0: Device offlined - not ready after error recovery
> Jun 29 09:52:12 karryall kernel: [  302.210319] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210333] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210339] sd 10:0:0:0: [sdc] READ CAPACITY(16) failed
> Jun 29 09:52:12 karryall kernel: [  302.210343] sd 10:0:0:0: [sdc]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Jun 29 09:52:12 karryall kernel: [  302.210348] sd 10:0:0:0: [sdc] Sense not available.
> Jun 29 09:52:12 karryall kernel: [  302.210354] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210360] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210366] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210371] sd 10:0:0:0: [sdc] READ CAPACITY failed
> Jun 29 09:52:12 karryall kernel: [  302.210373] sd 10:0:0:0: [sdc]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Jun 29 09:52:12 karryall kernel: [  302.210378] sd 10:0:0:0: [sdc] Sense not available.
> Jun 29 09:52:12 karryall kernel: [  302.210383] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210390] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210397] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210404] sd 10:0:0:0: rejecting I/O to offline device
> Jun 29 09:52:12 karryall kernel: [  302.210411] sd 10:0:0:0: [sdc] Asking for cache data failed
> Jun 29 09:52:12 karryall kernel: [  302.210414] sd 10:0:0:0: [sdc] Assuming drive cache: write through
> Jun 29 09:52:12 karryall kernel: [  302.210420] sdc: detected capacity change from 499363373056 to 0
> Jun 29 09:52:12 karryall kernel: [  302.210530] sd 10:0:0:0: [sdc] Attached SCSI disk

And then the UAS driver doesn't recover from the device reset.  It's not
surprising to me.  The UAS driver is still experimental, and it just
doesn't have good error handling.  You can try the UAS patches I've been
working on, but they really only fix an issue where the device stops
responding to a particular SCSI command, and I don't think it will help
with a transfer error:

http://marc.info/?l=linux-usb&m=132285584308027&w=2

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html