cdc-acm cooldown + Cisco 2960-X = kernel warning + dead USB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

Opengear makes a device (OM2200) that you're supposed to plug into consoles in order to access them remotely but the Cisco 2960-X is causing us grief. We can trivially break our device in just 3 steps.

1. Connect the Cisco 2960-X console.
2. (Re)boot our device.
3. Open the Cisco's console device (/dev/ttyACM0) and write to it.

When we were using Linux 5.2.32 this wasn't fatal. It was possible to disconnect and reconnect the Cisco and it would work as expected. The same was observed on our older devices that run Linux 3.10 on ARM and on a laptop running macOS 10.13. But we upgraded to Linux 5.4.61 and it got much worse. I did some digging and it seems that the cdc-acm cooldown commit (f4d1cf2ef83caeab212e842fd238cb8353f59fa2) is the cause.

Before I continue, I need to acknowledge that the Cisco 2960-X is really broken. Unlike every other Cisco console I could find to test with, it shows up as USB 2 rather than USB 1, causes warnings to be printed and sends corrupt identity strings.

    usb 2-1.1: new high-speed USB device number 6 using ehci-pci
    usb 2-1.1: config 1 interface 0 altsetting 0 endpoint 0x82 has an invalid bInterval 255, changing to 11
    usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x1 has invalid maxpacket 64
    usb 2-1.1: config 1 interface 1 altsetting 0 bulk endpoint 0x81 has invalid maxpacket 64
    usb 2-1.1: New USB device found, idVendor=05a6, idProduct=0009, bcdDevice= 0.00
    usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
    usb 2-1.1: Product: C�~B�~@~@ल^D
    usb 2-1.1: Manufacturer: C�~B�~@~@ल^D
    usb 2-1.1: SerialNumber: C�~B�~@~@ल^D�~@�~B
    cdc_acm 2-1.1:1.0: ttyACM0: USB ACM device

Despite this though, it does seem to work, except when it is connected during boot. In that case, we get this kernel warning:

    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 0 at kernel/workqueue.c:1477 __queue_work+0x25a/0x300
    Modules linked in: xt_CT xt_tcpudp nf_nat_tftp nft_objref nf_conntrack_tftp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables_set nft_chain_nat ip6table_nat ip>
    CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           O      5.4.61-og #1
    Hardware name: Opengear hedgehog/hedgehog, BIOS 698f4312a5-jenkins 08/28/2020
    RIP: 0010:__queue_work+0x25a/0x300
    Code: 94 b5 73 a9 00 01 1f 00 75 0f 65 48 8b 3c 25 00 5d 01 00 f6 47 24 20 75 24 0f 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b e9 79 fe ff ff 48 8d 53 60 83 c9 02 >
    RSP: 0018:ffffb59640114e88 EFLAGS: 00010002
    RAX: ffffa049e7203790 RBX: ffffa049eaba2f00 RCX: ffffa049c79f61b8
    RDX: ffffa049e7203798 RSI: 000000007fffffff RDI: ffffa049eab9ef80
    RBP: ffffa049ea010000 R08: 0000000000000000 R09: ffffb59640114db8
    R10: 0000000000000040 R11: 0000000000000000 R12: 0000000000000003
    R13: 0000000000000007 R14: 0000000000000004 R15: ffffa049e7203790
    FS:  0000000000000000(0000) GS:ffffa049eab80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ff773c1a024 CR3: 000000011e548000 CR4: 00000000000406e0
    Call Trace:
     <IRQ>
     queue_work_on+0x17/0x20
     __usb_hcd_giveback_urb+0x4e/0xb0
     usb_giveback_urb_bh+0x8e/0xe0
     tasklet_action_common.isra.0+0x48/0xa0
     __do_softirq+0xd1/0x213
     irq_exit+0xc8/0xd0
     do_IRQ+0x48/0xd0
     common_interrupt+0xf/0xf
     </IRQ>
    RIP: 0010:cpuidle_enter_state+0x120/0x2a0
    Code: e8 75 8a aa ff 31 ff 49 89 c6 e8 bb 9a aa ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 6a 01 00 00 31 ff e8 54 c3 ae ff fb 45 85 ed <0f> 88 c2 00 00 00 49 63 f5 4c 89 f1 48 8d >
    RSP: 0018:ffffb59640087e80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffdd
    RAX: ffffa049eab9f600 RBX: ffffffff8d650480 RCX: 000000000000001f
    RDX: 0000000000000000 RSI: 00000000803d7d59 RDI: 0000000000000000
    RBP: 00000210865b9db0 R08: 00000210865f689a R09: 000000007fffffff
    R10: ffffa049eab9e700 R11: ffffa049eab9e6e0 R12: ffffa049e78c9000
    R13: 0000000000000002 R14: 00000210865f689a R15: 0000000000000000
     cpuidle_enter+0x24/0x40
     do_idle+0x1bf/0x230
     cpu_startup_entry+0x14/0x20
     start_secondary+0x14a/0x180
     secondary_startup_64+0xa4/0xb0
    ---[ end trace 12a803438e4082c9 ]--

It comes from __queue_work: WARN_ON(!list_empty(&work->entry))

Once this happens, we can no longer disconnect and reconnect the Cisco. Only a reboot seems to get things working again. If we disconnect and reconnect the Cisco without writing to it, we avoid the issue.


While reverting the cdc-acm cooldown patch gets us back to the not-great-but-not-fatal behaviour, I don't feel that this is a useful long-term situation. I guess that someone (probably me - I doubt many people have access to one of these things) needs to see if we can make the Cisco 2960-X behave better, maybe by enabling some of the 'quirks' in the cdc-adm driver.

But I also wonder why this cooldown is triggering the error, and if there's maybe something in here that is bad, but only exposed by a broken device like Cisco?

Any guidance would be appreciated.

Lincoln




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux