On Wed, Jul 15, 2020 at 12:31:42PM +0200, David Guillen Fandos wrote: > On Wed, 2020-07-15 at 11:30 +0200, Greg KH wrote: > > On Wed, Jul 15, 2020 at 10:58:03AM +0200, David Guillen Fandos wrote: > > > Hello linux-usb, > > > > > > I think I might have found a kernel bug related to the USB > > > subsystem > > > (cdc_acm perhaps). > > > > > > Context: I was playing around with a device I'm creating, > > > essentially a > > > USB quad modem device that exposes four modems to the host system. > > > This > > > device is still a prototype so there's a few bugs here and there, > > > most > > > likely in the USB descriptors and control requests. > > > > > > What happens: After plugging the device the system starts spitting > > > warnings and BUGs and it locks up. Most of the time some CPUs get > > > into > > > some spinloop and never comes back (you can see it being detected > > > by > > > the watchdog after a few seconds). Generally after that the USB > > > devices > > > stop working completely and at some point the machine freezes > > > completely. In a couple of ocasions I managed to see a bug in dmesg > > > saying "unable to handle page fault for address XXX" and > > > "Supervisor > > > read access in kernel mode" "error code (0x0000) not present page". > > > I > > > could not get a trace for that one since the kernel died completely > > > and > > > my log files were truncated/lost. > > > > > > Since it is happening to my two machines (both Intel but rather > > > different controllers, Sunrise Point-LP USB 3.0 vs 8 Series/C220) > > > and > > > with different kernel versions I suspect this might be a bug in the > > > kernel. > > > > > > I have 4 logs that I collected, they are sort of long-ish, not sure > > > how > > > to best send them to the list. > > > > Send the crashes with the callback list, that should be quite small, > > right? We don't need the full log. > > > > The first crash is the most important, the others can be from the > > first > > one and are not reliable. > > > > thanks, > > > > greg k-h > > Ok then, here comes one of the logs, I selected some bits only > > [ 147.302016] WARNING: CPU: 3 PID: 134 at kernel/workqueue.c:1473 > __queue_work+0x364/0x410 > [...] > [ 147.302322] Call Trace: > [ 147.302329] <IRQ> > [ 147.302342] queue_work_on+0x36/0x40 > [ 147.302353] __usb_hcd_giveback_urb+0x9c/0x110 > [ 147.302362] usb_giveback_urb_bh+0xa0/0xf0 > [ 147.302372] tasklet_action_common.constprop.0+0x66/0x100 > [ 147.302382] __do_softirq+0xe9/0x2dc > [ 147.302391] irq_exit+0xcf/0x110 > [ 147.302397] do_IRQ+0x55/0xe0 > [ 147.302408] common_interrupt+0xf/0xf > [ 147.302413] </IRQ> > [...] > [ 184.771172] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! > [kworker/3:2:134] That was the first message? Ok, we need some more logs, how about the 30 lines right before the above? And what kernel version are you using? thanks, greg k-h