Re: [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.4.2020 6.24, Matthew Ruffell wrote:
> Hi,
> 
> I am struggling to debug a regression, and it would be great if we could work
> this out together. I'm not very familiar with the USB subsystem, but I will try
> my best.
> 
> We have had a few reports from users where their USB devices are extremely slow,
> to the point where they are unusable. Their dmesg buffers are filled with:
> 
> [ 14.000130] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> [ 18.092123] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> [ 22.172116] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> [ 26.252116] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> 
> This only seems to happen with specific hardware combinations. We did a git
> bisect, and determined that the below commit was the root cause:
> 
> commit 4fdc1790e6a9ef22399c6bc6e63b80f4609f3b7e
> Author: Jan-Marek Glogowski <glogow@xxxxxxxxxx>
> Date:   Fri Feb 1 13:52:31 2019 +0100
> Subject: usb: handle warm-reset port requests on hub resume
> 
> We had to revert this commit from the Ubuntu kernels for the time being, but we
> wish to re-apply it once we determine how to fix this regression.
> 
> The problem still exists on the latest 5.7-rc1 kernel, and if we revert the
> above commit, the problem vanishes.
> 
> Logs (from an affected system, it has a LOT of USB devices):
> 
> System Info:
> Ubuntu 18.04, 4.15.0-72-generic kernel.
> 
> lsusb:                                  https://paste.ubuntu.com/p/Syh2StCyyT/
> lsusb -t:                               https://paste.ubuntu.com/p/fG7DdXvh58/
> filtered dmesg, with usbcore.dyndbg=+p: https://paste.ubuntu.com/p/DyY9SJRcdv/
> filtered lspci:                         https://paste.ubuntu.com/p/Kr96PvRmH4/
> 
There are no USB3 devices enumerated.

Do any of the devices connected to the hub support USB 3 speeds (5Gbps or faster)?
If not, could you add one the hub hub?

> My commentary on the problem:
> 
> First the hubs come onto the scene:
> 
> USB 2:
> 
> xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
> usb usb1: Product: xHCI Host Controller
> hub 1-0:1.0: USB hub found
> 
> USB 3:
> 
> xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
> xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
> usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
> usb usb2: Product: xHCI Host Controller
> hub 2-0:1.0: USB hub found 
> 
> Next, the USB 2 and 3 port peering happens. Let's only look at usb1-port2 and
> usb2-port2.
> 
> usb usb2-port2: peered to usb1-port2
> 
> The next interesting message is hub 1-0 reporting normal state, followed by 
> hub 2-0 reporting okay, then going into hub suspend.
> 
> hub 1-0:1.0: state 7 ports 16 chg 201c evt 0000
> hub 2-0:1.0: state 7 ports 10 chg 0000 evt 0000
> hub 2-0:1.0: hub_suspend
> usb usb2: bus auto-suspend, wakeup 1
> 
> A new USB device is found on port 2. It gets registered under usb 1-2 and not
> usb 2-2.
> 
> usb 1-2: new high-speed USB device number 2 using xhci_hcd
> 
> Following that, hub 2-0 resumes, noting a race event:
> 
> usb usb2: suspend raced with wakeup event
> usb usb2: usb auto-resume
> hub 2-0:1.0: hub_resume
> 
> Things keep on going with usb 1-2 though:
> 
> usb 1-2: New USB device found, idVendor=0424, idProduct=2807
> usb 1-2: Product: USB2807 Hub
> usb 1-2: Manufacturer: Microchip
> hub 1-2:1.0: USB hub found
> 
> Its a Microchip USB hub. It becomes hub 1-2. The next thing to happen is where
> the faults start coming in:
> 
> usb usb2-port2: do warm reset
> 
> A warm reset is issued to usb 2-2. But as we have previously gathered before,
> 2-2 is an "empty" port, peered to 1-2, and the device is under 1-2. This seems
> to be linked with the hub 2-0 suspend and auto-resume event.
> 
> This of course matches the text in the problematic commit message:
> 
>> This just happens, if its the only device on the root hub, the hub
>> therefore resumes and the HCDs status_urb isn't yet available.
>> If a warm-reset request is detected, this sets the hubs event_bits,
>> which will prevent any auto-suspend and allows the hubs workqueue
>> to warm-reset the port later in port_event.
> 
> At the moment, usb 1-2 is the only device on the hub 1-0 root hub. Or so I think
> anyway. I think I am reading correctly that hub 2-0 has no devices yet.
> 
> Anyway, the warm reset is tried again after the first timeout:
> 
> usb usb2-port2: not warm reset yet, waiting 50ms
> usb usb2-port2: not warm reset yet, waiting 200ms
> 
> While this is happening, usb 1-2 is trying to set up the new hub 1-2.
> 
> usb 1-2-port1: status 0101 change 0001
> usb 1-2-port2: status 0101 change 0001
> ...
> 
> A new warm reset is tried:
> 
> usb usb2-port2: not warm reset yet, waiting 200ms
> 
> hub 1-2 starts discovering new devices:
> 
> usb 1-2.1: New USB device found, idVendor=0403, idProduct=6011
> usb 1-2.1: Product: Quad RS232-HS
> usb 1-2.1: Manufacturer: FTDI
> 
> So things seem to be functioning normally at USB 2.0 speeds for usb 1-2. Back
> on the usb 2-2 front:
> 
> usb usb2-port2: not warm reset yet, waiting 200ms
> usb usb2-port2: not enabled, trying warm reset again...
> 
> From there on hub 1-2 keeps on discovering new devices and bringing them up,
> and usb 2-2 keeps trying to warm reset.
> 
> Looking at the problematic code: 
> 
> /* Make sure a warm-reset request is handled by port_event */
> if (type == HUB_RESUME &&
> hub_port_warm_reset_required(hub, port1, portstatus))
> set_bit(port1, hub->event_bits);
> 
> It almost feels like we are missing a check to see if the port in question is
> already in use in another bus. Or if the port is "empty".
> 

Unlike other devices USB 3 hubs will enumerate as both USB 2.0 and USB 3 devices. 

Looks like the USB 3 part of the hub is not working correctly.

My guess is that without the patch the USB 3 part of the hub is not working either. 
Patch probably helps hub driver discover there is a USB 3 device in a SS_INVALID link
state in the first place, and starts recovering it by warm resetting it.


> In any case, reverting this commit on a mainline 5.7-rc1 kernel fixes the problem.
> 
> We have obtained access to an affected machine, and we can gather any debug data
> that might be useful, or test any patches. Let me know what I can do, and I will
> promptly fetch output.


If it's an external hub can you try connecting some other USB 3 device first to a 
USB 3 roothub port, and then the hub to another USB 3 roothub port?

This way the USB 3 roothub (bus) should not be suspended when the USB 3 hub is connected.
Does that help, or change anything

Thanks
-Mathias




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux