Re: xhci: suspend/resume issues

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Tue, 9 Nov 2010 13:54:15 -0800

On Tue, Nov 09, 2010 at 02:29:55PM -0500, Don Zickus wrote:
> Hello,
> 
> I was trying out 2.6.37-rc1 to see how well suspend/resume works for usb3.
> It didn't work out for me very well.  I received a whole bunch of strange
> warnings and I couldn't talk to my usb3 harddrive no more.
> 
> Help?
> 
> I am running on an old dual core Intel Xeon processor, using a pci-e NEC
> usb3 card, which connects to a Buffalo external harddrive.  After booting
> the kernel I can mount and see everything on the harddrive fine.  But not
> after suspend/resume.
> 
> Attached is my dmesg log which includes boot up, suspending and resuming.
> This dmesg log does _not_ have any usb3 harddrives mounted (but it is 
> connected).
> 
> Let me know if there is any other info I can provide.

Can you recompile with CONFIG_USB_HCD_XHCI_DEBUGGING turned on?  That
way I can get a clearer picture of what's actually happening.  This will
generate a lot more output, so I'm only really interested in the log
file around the time you try to suspend the system, and the device gets
reset.

Analysis of the log:

> xhci_hcd 0000:04:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
...
> usb usb6: Product: xHCI Host Controller
...
> usb 6-2: new SuperSpeed USB device using xhci_hcd and address 2
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> usb 6-2: New USB device found, idVendor=0411, idProduct=0184
> usb 6-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> usb 6-2: Product: HD-HXU3
> usb 6-2: Manufacturer: BUFFALO
> usb 6-2: SerialNumber: 0000010121A3
....
> Initializing USB Mass Storage driver...
> scsi4 : usb-storage 6-2:1.0
> usbcore: registered new interface driver usb-storage
> USB Mass Storage support registered.
> scsi 4:0:0:0: Direct-Access     BUFFALO  External HDD     0100 PQ: 0 ANSI: 5
> sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
...
> EXT4-fs (sdb2): recovery complete
> EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)
> SELinux: initialized (dev sdb2, type ext4), uses xattr
> SELinux: initialized (dev sdb1, type vfat), uses genfs_contexts

Your device is very unhappy after this point.  Perhaps because of
SELinux?  Some process is doing *something* with the hard drive that the
device doesn't like, even though you say it's not mounted.  I've never
tried enabling SELinux with a USB 3.0 hard drive, so it's possible the
device doesn't like some SCSI command SELinux issues.

> xhci_hcd 0000:04:00.0: WARN: Stalled endpoint
> usb 6-2: reset SuperSpeed USB device using xhci_hcd and address 2
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880034aa85c0
> xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880034aa8600
> xhci_hcd 0000:04:00.0: WARN: Stalled endpoint
> usb 6-2: reset SuperSpeed USB device using xhci_hcd and address 2
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880034aa85c0
> xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880034aa8600

The USB mass storage driver tries to recover by resetting the device and
re-fetching the device descriptors.  I'm can't tell if it succeeded.

> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> Suspending console(s) (use no_console_suspend to debug)
...
> Back to C!
...
> xhci_hcd 0000:04:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> uhci_hcd 0000:00:1d.2: setting latency timer to 64
> [drm] nouveau 0000:05:00.0: We're back, enabling device...
> xhci_hcd 0000:04:00.0: setting latency timer to 64
> nouveau 0000:05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> usb usb4: root hub lost power or was reset
> nouveau 0000:05:00.0: setting latency timer to 64
> usb usb6: root hub lost power or was reset
> [drm] nouveau 0000:05:00.0: POSTing device...
> uhci_hcd 0000:00:1d.3: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> uhci_hcd 0000:00:1d.3: setting latency timer to 64
> xhci_hcd 0000:04:00.0: dma_pool_destroy xHCI ring segments, ffff880038f5b000 busy

That's an interesting bug that will cause the xHCI driver to leak
memory.  For some reason, there is an endpoint ring segment (or perhaps
an event ring or command ring segment) that is not freed when the
roothub loses power.  This code in xhci_mem_cleanup() is supposed to
take care of that:

	for (i = 1; i < MAX_HC_SLOTS; ++i)
		xhci_free_virt_device(xhci, i);

	if (xhci->segment_pool)
		dma_pool_destroy(xhci->segment_pool);
	xhci->segment_pool = NULL;

So we know that the entry in the slot array context got set to NULL
before all the ring segments for that context were given back to the
segment pool.

I wonder if an endpoint was stalled, and the mass storage driver
attempted to reset the device before resetting the endpoint?  I don't
think that theory holds water because a stalled endpoint should
transition to the "halted" state, not the "disabled" state.  So I'm not
sure where the xHCI driver is leaking ring segments.

In short, I need the log with full xHCI driver debug. :)

> usb 6-2: USB disconnect, address 2
> xHCI xhci_free_dev called with unaddressed device
> xhci_hcd 0000:04:00.0: Can't reset device (slot ID 1) in enabled/disabled state
> xhci_hcd 0000:04:00.0: Not freeing device rings.
> usb 6-4: new high speed USB device using xhci_hcd and address 3
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> usb 6-4: device descriptor read/8, error -61
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> xhci_hcd 0000:04:00.0: WARN: short transfer on control ep
> usb 6-4: device descriptor read/8, error -61

Your device is completely hosed at this point, and comes back as a high
speed device, rather than a SuperSpeed device.  And then it returns some
other descriptor type when asked for it's device descriptor (-61 is
-ENODATA, which I think corresponds to a line in usb_get_descriptor() in
drivers/usb/core/message.c).

> xhci_hcd 0000:04:00.0: Can't reset device (slot ID 1) in enabled/disabled state
> xhci_hcd 0000:04:00.0: Not freeing device rings.
> usb 6-4: new high speed USB device using xhci_hcd and address 4
> xhci_hcd 0000:04:00.0: WARN: transfer error on endpoint
> xhci_hcd 0000:04:00.0: WARN: transfer error on endpoint
> xhci_hcd 0000:04:00.0: WARN: transfer error on endpoint
> usb 6-4: device descriptor read/8, error -71

And then there's bus noise or the device spewing bad data.

So, first step is to use the same kernel with more xHCI debugging, so I
can see what's up with the device being reset right before the suspend.
Then once that's fixed, you can see if turning off SELinux causes the
issue to go away.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html