On Wed, 2020-07-15 at 11:30 +0200, Greg KH wrote: > On Wed, Jul 15, 2020 at 10:58:03AM +0200, David Guillen Fandos wrote: > > Hello linux-usb, > > > > I think I might have found a kernel bug related to the USB > > subsystem > > (cdc_acm perhaps). > > > > Context: I was playing around with a device I'm creating, > > essentially a > > USB quad modem device that exposes four modems to the host system. > > This > > device is still a prototype so there's a few bugs here and there, > > most > > likely in the USB descriptors and control requests. > > > > What happens: After plugging the device the system starts spitting > > warnings and BUGs and it locks up. Most of the time some CPUs get > > into > > some spinloop and never comes back (you can see it being detected > > by > > the watchdog after a few seconds). Generally after that the USB > > devices > > stop working completely and at some point the machine freezes > > completely. In a couple of ocasions I managed to see a bug in dmesg > > saying "unable to handle page fault for address XXX" and > > "Supervisor > > read access in kernel mode" "error code (0x0000) not present page". > > I > > could not get a trace for that one since the kernel died completely > > and > > my log files were truncated/lost. > > > > Since it is happening to my two machines (both Intel but rather > > different controllers, Sunrise Point-LP USB 3.0 vs 8 Series/C220) > > and > > with different kernel versions I suspect this might be a bug in the > > kernel. > > > > I have 4 logs that I collected, they are sort of long-ish, not sure > > how > > to best send them to the list. > > Send the crashes with the callback list, that should be quite small, > right? We don't need the full log. > > The first crash is the most important, the others can be from the > first > one and are not reliable. > > thanks, > > greg k-h Ok then, here comes one of the logs, I selected some bits only [ 147.302016] WARNING: CPU: 3 PID: 134 at kernel/workqueue.c:1473 __queue_work+0x364/0x410 [...] [ 147.302322] Call Trace: [ 147.302329] <IRQ> [ 147.302342] queue_work_on+0x36/0x40 [ 147.302353] __usb_hcd_giveback_urb+0x9c/0x110 [ 147.302362] usb_giveback_urb_bh+0xa0/0xf0 [ 147.302372] tasklet_action_common.constprop.0+0x66/0x100 [ 147.302382] __do_softirq+0xe9/0x2dc [ 147.302391] irq_exit+0xcf/0x110 [ 147.302397] do_IRQ+0x55/0xe0 [ 147.302408] common_interrupt+0xf/0xf [ 147.302413] </IRQ> [...] [ 184.771172] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:2:134] The other machines: Jul 15 01:22:16 localhost kernel: WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:1473 __queue_work+0x342/0x3e0 [...] Jul 15 01:22:16 localhost kernel: Call Trace: Jul 15 01:22:16 localhost kernel: <IRQ> Jul 15 01:22:16 localhost kernel: queue_work_on+0x36/0x40 Jul 15 01:22:16 localhost kernel: __usb_hcd_giveback_urb+0x6f/0x120 Jul 15 01:22:16 localhost kernel: usb_giveback_urb_bh+0xa0/0xf0 Jul 15 01:22:16 localhost kernel: tasklet_action_common.isra.0+0x5b/0x100 Jul 15 01:22:16 localhost kernel: __do_softirq+0xee/0x2ff Jul 15 01:22:16 localhost kernel: irq_exit+0xe9/0xf0 Jul 15 01:22:16 localhost kernel: do_IRQ+0x55/0xe0 Jul 15 01:22:16 localhost kernel: common_interrupt+0xf/0xf Jul 15 01:22:16 localhost kernel: </IRQ> Pretty similar as well. This is the first crash, the other ones mention the cdc-acm but the first ones do not. In one occasion the first crash seems different, here goes: [ 46.713823] ------------[ cut here ]------------ [ 46.713827] refcount_t: underflow; use-after-free. [ 46.713884] WARNING: CPU: 2 PID: 33 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 [...] [ 46.714148] Call Trace: [ 46.714166] acm_disconnect+0x198/0x280 [cdc_acm] [ 46.714188] usb_unbind_interface+0x8a/0x270 [ 46.714198] __device_release_driver+0x15c/0x210 [ 46.714204] device_release_driver+0x24/0x30 [ 46.714209] bus_remove_device+0xdb/0x140 [ 46.714218] device_del+0x16f/0x3f0 [ 46.714227] usb_disable_device+0xd6/0x290 [ 46.714234] usb_disconnect.cold+0x7e/0x205 [ 46.714242] hub_event+0xbfa/0x1810 [ 46.714257] process_one_work+0x1b4/0x380 [ 46.714264] worker_thread+0x53/0x3e0 [ 46.714269] ? process_one_work+0x380/0x380 [ 46.714276] kthread+0x115/0x140 [ 46.714284] ? __kthread_bind_mask+0x60/0x60 [ 46.714293] ret_from_fork+0x35/0x40 Let me know if you need more relevant/complete logs. Will be glad to send them over. Thanks David