On Wed, 2020-07-22 at 16:41 +0200, David Guillen Fandos wrote: > On Tue, 2020-07-21 at 10:26 +0200, Greg KH wrote: > > On Mon, Jul 20, 2020 at 10:39:40PM +0200, David Guillen Fandos > > wrote: > > > On Mon, 2020-07-20 at 11:55 -0500, Dan Williams wrote: > > > > On Mon, 2020-07-20 at 01:36 +0200, David Guillen Fandos wrote: > > > > > On Thu, 2020-07-16 at 16:30 +0200, David Guillen Fandos > > > > > wrote: > > > > > > On Wed, 2020-07-15 at 19:03 +0200, David Guillen Fandos > > > > > > wrote: > > > > > > > On Wed, 2020-07-15 at 14:24 +0200, Greg KH wrote: > > > > > > > > On Wed, Jul 15, 2020 at 01:20:54PM +0200, David Guillen > > > > > > > > Fandos > > > > > > > > wrote: > > > > > > > > > On Wed, 2020-07-15 at 13:12 +0200, Greg KH wrote: > > > > > > > > > > On Wed, Jul 15, 2020 at 12:57:14PM +0200, David > > > > > > > > > > Guillen > > > > > > > > > > Fandos > > > > > > > > > > wrote: > > > > > > > > > > > On Wed, 2020-07-15 at 12:50 +0200, Greg KH wrote: > > > > > > > > > > > > On Wed, Jul 15, 2020 at 12:31:42PM +0200, David > > > > > > > > > > > > Guillen > > > > > > > > > > > > Fandos > > > > > > > > > > > > wrote: > > > > > > > > > > > > > On Wed, 2020-07-15 at 11:30 +0200, Greg KH > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > On Wed, Jul 15, 2020 at 10:58:03AM +0200, > > > > > > > > > > > > > > David > > > > > > > > > > > > > > Guillen > > > > > > > > > > > > > > Fandos > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hello linux-usb, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think I might have found a kernel bug > > > > > > > > > > > > > > > related > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > USB > > > > > > > > > > > > > > > subsystem > > > > > > > > > > > > > > > (cdc_acm perhaps). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Context: I was playing around with a > > > > > > > > > > > > > > > device > > > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > > creating, > > > > > > > > > > > > > > > essentially a > > > > > > > > > > > > > > > USB quad modem device that exposes four > > > > > > > > > > > > > > > modems > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > host > > > > > > > > > > > > > > > system. > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > device is still a prototype so there's a > > > > > > > > > > > > > > > few > > > > > > > > > > > > > > > bugs > > > > > > > > > > > > > > > here > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > there, > > > > > > > > > > > > > > > most > > > > > > > > > > > > > > > likely in the USB descriptors and control > > > > > > > > > > > > > > > requests. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What happens: After plugging the device > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > system > > > > > > > > > > > > > > > starts > > > > > > > > > > > > > > > spitting > > > > > > > > > > > > > > > warnings and BUGs and it locks up. Most > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > CPUs > > > > > > > > > > > > > > > get > > > > > > > > > > > > > > > into > > > > > > > > > > > > > > > some spinloop and never comes back (you > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > see > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > being > > > > > > > > > > > > > > > detected > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > the watchdog after a few seconds). > > > > > > > > > > > > > > > Generally > > > > > > > > > > > > > > > after > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > USB > > > > > > > > > > > > > > > devices > > > > > > > > > > > > > > > stop working completely and at some point > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > machine > > > > > > > > > > > > > > > freezes > > > > > > > > > > > > > > > completely. In a couple of ocasions I > > > > > > > > > > > > > > > managed > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > see > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > bug > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > dmesg > > > > > > > > > > > > > > > saying "unable to handle page fault for > > > > > > > > > > > > > > > address > > > > > > > > > > > > > > > XXX" > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > "Supervisor > > > > > > > > > > > > > > > read access in kernel mode" "error code > > > > > > > > > > > > > > > (0x0000) > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > present > > > > > > > > > > > > > > > page". > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > could not get a trace for that one since > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > kernel > > > > > > > > > > > > > > > died > > > > > > > > > > > > > > > completely > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > my log files were truncated/lost. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since it is happening to my two machines > > > > > > > > > > > > > > > (both > > > > > > > > > > > > > > > Intel > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > rather > > > > > > > > > > > > > > > different controllers, Sunrise Point-LP > > > > > > > > > > > > > > > USB > > > > > > > > > > > > > > > 3.0 > > > > > > > > > > > > > > > vs > > > > > > > > > > > > > > > 8 > > > > > > > > > > > > > > > Series/C220) > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > with different kernel versions I suspect > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > might > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > bug in > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > kernel. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have 4 logs that I collected, they are > > > > > > > > > > > > > > > sort > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > long- > > > > > > > > > > > > > > > ish, > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > to best send them to the list. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Send the crashes with the callback list, > > > > > > > > > > > > > > that > > > > > > > > > > > > > > should > > > > > > > > > > > > > > be > > > > > > > > > > > > > > quite > > > > > > > > > > > > > > small, > > > > > > > > > > > > > > right? We don't need the full log. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The first crash is the most important, the > > > > > > > > > > > > > > others > > > > > > > > > > > > > > can > > > > > > > > > > > > > > be > > > > > > > > > > > > > > from > > > > > > > > > > > > > > the > > > > > > > > > > > > > > first > > > > > > > > > > > > > > one and are not reliable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > greg k-h > > > > > > > > > > > > > > > > > > > > > > > > > > Ok then, here comes one of the logs, I > > > > > > > > > > > > > selected > > > > > > > > > > > > > some > > > > > > > > > > > > > bits > > > > > > > > > > > > > only > > > > > > > > > > > > > > > > > > > > > > > > > > [ 147.302016] WARNING: CPU: 3 PID: 134 at > > > > > > > > > > > > > kernel/workqueue.c:1473 > > > > > > > > > > > > > __queue_work+0x364/0x410 > > > > > > > > > > > > > [...] > > > > > > > > > > > > > [ 147.302322] Call Trace: > > > > > > > > > > > > > [ 147.302329] <IRQ> > > > > > > > > > > > > > [ 147.302342] queue_work_on+0x36/0x40 > > > > > > > > > > > > > [ 147.302353] __usb_hcd_giveback_urb+0x9c/0 > > > > > > > > > > > > > x1 > > > > > > > > > > > > > 10 > > > > > > > > > > > > > [ 147.302362] usb_giveback_urb_bh+0xa0/0xf0 > > > > > > > > > > > > > [ 147.302372] tasklet_action_common.constpr > > > > > > > > > > > > > op > > > > > > > > > > > > > .0+0 > > > > > > > > > > > > > x6 > > > > > > > > > > > > > 6/ > > > > > > > > > > > > > 0x > > > > > > > > > > > > > 10 > > > > > > > > > > > > > 0 > > > > > > > > > > > > > [ 147.302382] __do_softirq+0xe9/0x2dc > > > > > > > > > > > > > [ 147.302391] irq_exit+0xcf/0x110 > > > > > > > > > > > > > [ 147.302397] do_IRQ+0x55/0xe0 > > > > > > > > > > > > > [ 147.302408] common_interrupt+0xf/0xf > > > > > > > > > > > > > [ 147.302413] </IRQ> > > > > > > > > > > > > > [...] > > > > > > > > > > > > > [ 184.771172] watchdog: BUG: soft lockup - > > > > > > > > > > > > > CPU#3 > > > > > > > > > > > > > stuck > > > > > > > > > > > > > for > > > > > > > > > > > > > 23s! > > > > > > > > > > > > > [kworker/3:2:134] > > > > > > > > > > > > > > > > > > > > > > > > That was the first message? > > > > > > > > > > > > > > > > > > > > > > > > Ok, we need some more logs, how about the 30 > > > > > > > > > > > > lines > > > > > > > > > > > > right > > > > > > > > > > > > before > > > > > > > > > > > > the > > > > > > > > > > > > above? > > > > > > > > > > > > > > > > > > > > > > > > And what kernel version are you using? > > > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > > > > > > > > > > > greg k-h > > > > > > > > > > > > > > > > > > > > > > Heh I assumed you would find the 3rd stack more > > > > > > > > > > > interesting > > > > > > > > > > > since > > > > > > > > > > > it > > > > > > > > > > > involves more subsystems but anyway, here we got, > > > > > > > > > > > the > > > > > > > > > > > first > > > > > > > > > > > one > > > > > > > > > > > with > > > > > > > > > > > more context. The trigger as you can see is me > > > > > > > > > > > connecting > > > > > > > > > > > the > > > > > > > > > > > USB > > > > > > > > > > > device: > > > > > > > > > > > > > > > > > > > > > > [ 141.445367] usb 1-1: new full-speed USB device > > > > > > > > > > > number > > > > > > > > > > > 5 > > > > > > > > > > > using > > > > > > > > > > > xhci_hcd > > > > > > > > > > > [ 141.573592] usb 1-1: New USB device found, > > > > > > > > > > > idVendor=0483, > > > > > > > > > > > idProduct=5740, bcdDevice= 2.00 > > > > > > > > > > > [ 141.573597] usb 1-1: New USB device strings: > > > > > > > > > > > Mfr=1, > > > > > > > > > > > Product=2, > > > > > > > > > > > SerialNumber=3 > > > > > > > > > > > [ 141.573601] usb 1-1: Product: Quad-UART serial > > > > > > > > > > > USB > > > > > > > > > > > device > > > > > > > > > > > [ 141.573603] usb 1-1: Manufacturer: davidgf.net > > > > > > > > > > > [ 141.573605] usb 1-1: SerialNumber: serialno > > > > > > > > > > > [ 142.375007] cdc_acm 1-1:1.0: ttyACM0: USB ACM > > > > > > > > > > > device > > > > > > > > > > > [ 142.376623] cdc_acm 1-1:1.2: ttyACM1: USB ACM > > > > > > > > > > > device > > > > > > > > > > > [ 142.378350] cdc_acm 1-1:1.4: ttyACM2: USB ACM > > > > > > > > > > > device > > > > > > > > > > > [ 142.379637] cdc_acm 1-1:1.6: ttyACM3: USB ACM > > > > > > > > > > > device > > > > > > > > > > > [ 142.382473] usbcore: registered new interface > > > > > > > > > > > driver > > > > > > > > > > > cdc_acm > > > > > > > > > > > [ 142.382476] cdc_acm: USB Abstract Control > > > > > > > > > > > Model > > > > > > > > > > > driver > > > > > > > > > > > for > > > > > > > > > > > USB > > > > > > > > > > > modems and ISDN adapters > > > > > > > > > > > [ 147.301997] ------------[ cut here ]-------- > > > > > > > > > > > ---- > > > > > > > > > > > [ 147.302016] WARNING: CPU: 3 PID: 134 at > > > > > > > > > > > kernel/workqueue.c:1473 > > > > > > > > > > > __queue_work+0x364/0x410 > > > > > > > > > > > [ 147.302019] Modules linked in: cdc_acm rfcomm > > > > > > > > > > > ccm > > > > > > > > > > > wireguard > > > > > > > > > > > curve25519_x86_64 libchacha20poly1305 > > > > > > > > > > > chacha_x86_64 > > > > > > > > > > > poly1305_x86_64 > > > > > > > > > > > libblake2s blake2s_x86_64 ip6_udp_tunnel > > > > > > > > > > > udp_tunnel > > > > > > > > > > > libcurve25519_generic libchacha > > > > > > > > > > > libblake2s_generic > > > > > > > > > > > nft_fib_inet > > > > > > > > > > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > > > > > > > > > > > nf_reject_ipv4 > > > > > > > > > > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat > > > > > > > > > > > ip6table_nat > > > > > > > > > > > ip6table_mangle ip6table_raw ip6table_security > > > > > > > > > > > iptable_nat > > > > > > > > > > > nf_nat > > > > > > > > > > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 > > > > > > > > > > > libcrc32c > > > > > > > > > > > iptable_mangle > > > > > > > > > > > iptable_raw iptable_security ip_set nf_tables > > > > > > > > > > > nfnetlink > > > > > > > > > > > ip6table_filter > > > > > > > > > > > ip6_tables iptable_filter cmac vboxnetadp(OE) > > > > > > > > > > > vboxnetflt(OE) > > > > > > > > > > > bnep > > > > > > > > > > > vboxdrv(OE) sunrpc vfat fat uvcvideo > > > > > > > > > > > videobuf2_vmalloc > > > > > > > > > > > videobuf2_memops > > > > > > > > > > > videobuf2_v4l2 videobuf2_common videodev btusb > > > > > > > > > > > btrtl > > > > > > > > > > > btbcm > > > > > > > > > > > btintel > > > > > > > > > > > mc > > > > > > > > > > > bluetooth ecdh_generic ecc iTCO_wdt > > > > > > > > > > > iTCO_vendor_support > > > > > > > > > > > mei_hdcp > > > > > > > > > > > intel_rapl_msr dell_laptop x86_pkg_temp_thermal > > > > > > > > > > > intel_powerclamp > > > > > > > > > > > coretemp kvm_intel kvm irqbypass intel_cstate > > > > > > > > > > > intel_uncore > > > > > > > > > > > intel_rapl_perf iwlmvm > > > > > > > > > > > [ 147.302121] snd_hda_codec_hdmi mac80211 > > > > > > > > > > > snd_soc_skl > > > > > > > > > > > snd_soc_sst_ipc > > > > > > > > > > > snd_soc_sst_dsp dell_wmi snd_hda_ext_core > > > > > > > > > > > dell_smbios > > > > > > > > > > > snd_hda_codec_realtek dcdbas libarc4 wmi_bmof > > > > > > > > > > > dell_wmi_descriptor > > > > > > > > > > > snd_soc_acpi_intel_match snd_soc_acpi > > > > > > > > > > > intel_wmi_thunderbolt > > > > > > > > > > > snd_hda_codec_generic snd_soc_core ledtrig_audio > > > > > > > > > > > iwlwifi > > > > > > > > > > > pcspkr > > > > > > > > > > > snd_compress ac97_bus snd_pcm_dmaengine > > > > > > > > > > > snd_hda_intel > > > > > > > > > > > snd_intel_dspcfg > > > > > > > > > > > snd_hda_codec cfg80211 snd_hda_core snd_hwdep > > > > > > > > > > > snd_seq > > > > > > > > > > > snd_seq_device > > > > > > > > > > > joydev snd_pcm rfkill snd_timer snd i2c_i801 > > > > > > > > > > > soundcore > > > > > > > > > > > idma64 > > > > > > > > > > > int3403_thermal intel_hid int3400_thermal > > > > > > > > > > > sparse_keymap > > > > > > > > > > > acpi_thermal_rel mei_me > > > > > > > > > > > intel_xhci_usb_role_switch > > > > > > > > > > > acpi_pad > > > > > > > > > > > roles > > > > > > > > > > > mei > > > > > > > > > > > intel_pch_thermal processor_thermal_device > > > > > > > > > > > intel_rapl_common > > > > > > > > > > > int340x_thermal_zone intel_soc_dts_iosf > > > > > > > > > > > binfmt_misc > > > > > > > > > > > ip_tables > > > > > > > > > > > dm_crypt > > > > > > > > > > > i915 rtsx_pci_sdmmc mmc_core crct10dif_pclmul > > > > > > > > > > > crc32_pclmul > > > > > > > > > > > i2c_algo_bit > > > > > > > > > > > cec crc32c_intel drm_kms_helper nvme > > > > > > > > > > > ghash_clmulni_intel > > > > > > > > > > > drm > > > > > > > > > > > nvme_core > > > > > > > > > > > serio_raw rtsx_pci hid_multitouch wmi i2c_hid > > > > > > > > > > > video > > > > > > > > > > > pinctrl_sunrisepoint pinctrl_intel > > > > > > > > > > > [ 147.302218] fuse > > > > > > > > > > > [ 147.302230] CPU: 3 PID: 134 Comm: kworker/3:2 > > > > > > > > > > > Tainted: > > > > > > > > > > > G IOE 5.7.7-200.fc32.x86_64 #1 > > > > > > > > > > > [ 147.302233] Hardware name: Dell Inc. XPS 13 > > > > > > > > > > > 9350/0PWNCR, > > > > > > > > > > > BIOS > > > > > > > > > > > 1.12.2 > > > > > > > > > > > 12/15/2019 > > > > > > > > > > > [ 147.302260] Workqueue: 0x0 (mm_percpu_wq) > > > > > > > > > > > [ 147.302275] RIP: 0010:__queue_work+0x364/0x410 > > > > > > > > > > > [ 147.302282] Code: e0 f1 69 a9 00 01 1f 00 75 > > > > > > > > > > > 0f > > > > > > > > > > > 65 > > > > > > > > > > > 48 > > > > > > > > > > > 8b > > > > > > > > > > > 3c > > > > > > > > > > > 25 > > > > > > > > > > > c0 8b > > > > > > > > > > > 01 00 f6 47 24 20 75 25 0f 0b 48 83 c4 10 5b 5d > > > > > > > > > > > 41 > > > > > > > > > > > 5c > > > > > > > > > > > 41 > > > > > > > > > > > 5d > > > > > > > > > > > 41 > > > > > > > > > > > 5e > > > > > > > > > > > 41 5f > > > > > > > > > > > c3 <0f> 0b e9 78 fe ff ff 41 83 cc 02 49 8d 57 60 > > > > > > > > > > > e9 5d > > > > > > > > > > > fe > > > > > > > > > > > ff > > > > > > > > > > > ff e8 > > > > > > > > > > > 53 > > > > > > > > > > > [ 147.302286] RSP: 0018:ffffbab980154e68 EFLAGS: > > > > > > > > > > > 00010002 > > > > > > > > > > > [ 147.302292] RAX: ffff8f551b333790 RBX: > > > > > > > > > > > 0000000000000048 > > > > > > > > > > > RCX: > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > [ 147.302295] RDX: ffff8f551b333798 RSI: > > > > > > > > > > > ffff8f5575803718 > > > > > > > > > > > RDI: > > > > > > > > > > > ffff8f5576daa840 > > > > > > > > > > > [ 147.302299] RBP: ffff8f551b333790 R08: > > > > > > > > > > > ffffffff97856cb0 > > > > > > > > > > > R09: > > > > > > > > > > > 0000000000000000 > > > > > > > > > > > [ 147.302302] R10: 0000000000000000 R11: > > > > > > > > > > > ffffffff97856cb8 > > > > > > > > > > > R12: > > > > > > > > > > > 0000000000000003 > > > > > > > > > > > [ 147.302306] R13: 0000000000002000 R14: > > > > > > > > > > > ffff8f5575c14e00 > > > > > > > > > > > R15: > > > > > > > > > > > ffff8f5576db0700 > > > > > > > > > > > [ 147.302311] FS: 0000000000000000(0000) > > > > > > > > > > > GS:ffff8f5576d80000(0000) > > > > > > > > > > > knlGS:0000000000000000 > > > > > > > > > > > [ 147.302315] CS: 0010 DS: 0000 ES: 0000 CR0: > > > > > > > > > > > 0000000080050033 > > > > > > > > > > > [ 147.302319] CR2: 00000000000000b0 CR3: > > > > > > > > > > > 0000000267774004 > > > > > > > > > > > CR4: > > > > > > > > > > > 00000000003606e0 > > > > > > > > > > > [ 147.302322] Call Trace: > > > > > > > > > > > [ 147.302329] <IRQ> > > > > > > > > > > > [ 147.302342] queue_work_on+0x36/0x40 > > > > > > > > > > > [ 147.302353] __usb_hcd_giveback_urb+0x9c/0x110 > > > > > > > > > > > [ 147.302362] usb_giveback_urb_bh+0xa0/0xf0 > > > > > > > > > > > > > > > > > > > > Are you sure your device is working properly and > > > > > > > > > > talking > > > > > > > > > > USB > > > > > > > > > > correctly > > > > > > > > > > to the host? It looks like you are just timing out > > > > > > > > > > for > > > > > > > > > > some > > > > > > > > > > reason. > > > > > > > > > > > > > > > > > > > > But, that warning is showing that something is odd > > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > usb > > > > > > > > > > workqueue, > > > > > > > > > > which is strange. > > > > > > > > > > > > > > > > > > > > What type of host controller is this talking > > > > > > > > > > to? And > > > > > > > > > > does > > > > > > > > > > your > > > > > > > > > > device > > > > > > > > > > actually answer the urbs being sent to it > > > > > > > > > > correctly? > > > > > > > > > > > > > > > > > > > > Using usbmon on this might be the best way to watch > > > > > > > > > > the > > > > > > > > > > USB > > > > > > > > > > traffic, > > > > > > > > > > if > > > > > > > > > > you don't have a hardware protocol sniffer, which > > > > > > > > > > could > > > > > > > > > > provide > > > > > > > > > > some > > > > > > > > > > clues as to what is going wrong. > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > > > > > > > greg k-h > > > > > > > > > > > > > > > > > > As I mentioned the device is likely buggy, since I'm > > > > > > > > > developing > > > > > > > > > and > > > > > > > > > debugging it. > > > > > > > > > However my ability to debug and fix any issue is > > > > > > > > > limited by > > > > > > > > > the > > > > > > > > > fact > > > > > > > > > that the kernel decides to stop working as usual, > > > > > > > > > making my > > > > > > > > > USB > > > > > > > > > keyboard and mouse useless, if not crashing later due > > > > > > > > > to > > > > > > > > > soft > > > > > > > > > lockups. > > > > > > > > > > > > > > > > > > Shouldn't the kernel be resilient to such devices? > > > > > > > > > > > > > > > > Yes it should, we should not crash. > > > > > > > > > > > > > > > > > > > > > > > > > I've developed quite > > > > > > > > > a few USB devices in the past and I've never ran into > > > > > > > > > things > > > > > > > > > like > > > > > > > > > this > > > > > > > > > on Linux (Windows is another story, rather 'easy' to > > > > > > > > > crash, > > > > > > > > > hang > > > > > > > > > or > > > > > > > > > bluescreen). In any case since I do not have access > > > > > > > > > to > > > > > > > > > a > > > > > > > > > hardware > > > > > > > > > debugger and the machine goes bananas (preventing me > > > > > > > > > from > > > > > > > > > using > > > > > > > > > Wireshark) I do not think I can further debug this > > > > > > > > > issue. I > > > > > > > > > could > > > > > > > > > try > > > > > > > > > to find a kernel version where this does not crash > > > > > > > > > the > > > > > > > > > machine > > > > > > > > > (only > > > > > > > > > tested 5.6.X and 5.7.X so far). Or perhaps use > > > > > > > > > VirtualBox, > > > > > > > > > but > > > > > > > > > I'd > > > > > > > > > need > > > > > > > > > to convice the host OS to ignore the USB device and > > > > > > > > > just > > > > > > > > > forward > > > > > > > > > it > > > > > > > > > to > > > > > > > > > the guest. > > > > > > > > > > > > > > > > Trying to trace down what part of the setup is failing, > > > > > > > > by > > > > > > > > using > > > > > > > > usbmon, > > > > > > > > will be good to try to figure out what the problem is > > > > > > > > here, > > > > > > > > if > > > > > > > > you > > > > > > > > can > > > > > > > > do that. > > > > > > > > > > > > > > > > > The firmware for this device can be easily tweaked to > > > > > > > > > expose > > > > > > > > > an > > > > > > > > > arbitrary number (up to 7 I think) of CDC ACM > > > > > > > > > interfaces. > > > > > > > > > When > > > > > > > > > I > > > > > > > > > use > > > > > > > > > one or two there's no issues, three has had some > > > > > > > > > issues > > > > > > > > > (but > > > > > > > > > did > > > > > > > > > not > > > > > > > > > investigate further). Going to four is what > > > > > > > > > consistently > > > > > > > > > triggers > > > > > > > > > kernel issues. > > > > > > > > > > > > > > > > Hm, that might be a clue, what does the output of > > > > > > > > 'lsusb > > > > > > > > -v' > > > > > > > > for > > > > > > > > that > > > > > > > > device when you have 3 and then 4 interfaces? > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > > > greg k-h > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I will try to see how I can debug further, perhaps I can > > > > > > > locate > > > > > > > a > > > > > > > machine or kernel that does not crash. Another option can > > > > > > > be to > > > > > > > disable > > > > > > > the firmware down to the minimum so that it does not > > > > > > > response > > > > > > > to > > > > > > > the > > > > > > > bulk endpoints (just to the enumeration and some basic > > > > > > > things), > > > > > > > to > > > > > > > rule > > > > > > > out a bad behaviour. > > > > > > > > > > > > > > The USB descriptor is what you could imagine, just > > > > > > > replicate > > > > > > > the > > > > > > > two > > > > > > > ACM interfaces (control & data) and add more endpoints. > > > > > > > Here > > > > > > > goes > > > > > > > the > > > > > > > one with three ports. Note this one seems to make the > > > > > > > kernel > > > > > > > crash > > > > > > > just > > > > > > > like the one with four. The only ones that work well are > > > > > > > 1 > > > > > > > and > > > > > > > 2 > > > > > > > ports. > > > > > > > Since I'm not aware of any other commercial solutions > > > > > > > (apart > > > > > > > from > > > > > > > FTDI) > > > > > > > that use more than 2 ACM ports, could that be the issue? > > > > > > > Meaning > > > > > > > there's a bug somewhere and no commercial hardware that > > > > > > > can > > > > > > > trigger > > > > > > > it. > > > > > > > > > > > > > > For reference the diff between two and three ports (in > > > > > > > lsusb) > > > > > > > is > > > > > > > that > > > > > > > it's missing the last two interaces (with the 3 EPs > > > > > > > described). > > > > > > > Of > > > > > > > course the bNumInterfaces is 4 instead of 6, and > > > > > > > wTotalLength > > > > > > > has > > > > > > > a > > > > > > > different value. > > > > > > > > > > > > > > Hope this can help somehow. > > > > > > > Thanks > > > > > > > David > > > > > > > > > > > > > > Bus 003 Device 012: ID 0483:5740 STMicroelectronics > > > > > > > Virtual > > > > > > > COM > > > > > > > Port > > > > > > > > > > > > Hey again, > > > > > > > > > > > > I was not aware about the modems Daniele, thanks! > > > > > > > > > > > > So I did some testing on my old BeagleBone black, which has > > > > > > a > > > > > > very > > > > > > old > > > > > > kernel (3.8.13-bone47). In this device the kernel is happy > > > > > > and I > > > > > > was > > > > > > able to do some testing, it seems to work well. The UARTs > > > > > > seem to > > > > > > work > > > > > > well in both directions, no weird shenanigans, no > > > > > > error/warn > > > > > > messages... > > > > > > > > > > > > I'm a bit at loss on how I can debug this further, I will > > > > > > try > > > > > > to > > > > > > use > > > > > > a > > > > > > RPi with a newer kernel and see what happens. I could try > > > > > > to > > > > > > boot > > > > > > a > > > > > > Live USB with older kernels (in my Intel machines) to try > > > > > > to > > > > > > locate > > > > > > a > > > > > > version where it works. Since I'm no kernel expert: any way > > > > > > I > > > > > > can > > > > > > provide more info? The computer becomes unusable shortly > > > > > > after > > > > > > plugging > > > > > > the device so I can't really do any meaningful stuff on it. > > > > > > > > > > > > Thanks again, > > > > > > David > > > > > > > > > > > > > > > > Hey there again! > > > > > > > > > > I managed to get a PCAP capture for this. Note that > > > > > NetworkManager > > > > > was > > > > > running and actively probing the ttyACM* devices for a modem, > > > > > hence > > > > > why > > > > > you can see "AT\n" commands being sent to the four devices. > > > > > > > > Do you mean ModemManager? NM moved the probing to ModemManager > > > > about > > > > 6 > > > > or 7 years ago. In any case, ModemManager has both a "don't > > > > probe" > > > > list, a greylist, and build-time options to only probe things > > > > it > > > > knows > > > > are modems. > > > > > > > > Of course the build-time policy depends on your distro; > > > > upstream > > > > ModemManager now defaults to "strict" mode (only probe known > > > > modems/drivers/USB IDs). > > > > > > > > Dan > > > > > > > > > As you can also probably see is that the device currently > > > > > ignores > > > > > any > > > > > control requests (like Set Line Coding). > > > > > > > > > > Hope it can help your debugging. > > > > > > > > > > David > > > > > > > > > > > > > > > > Yeah ModemManager. I disabled it to better debug. > > > > > > So I came to the bottom of the issue (device side). This device > > > does > > > not support more than 3 IN and 3 OUT endpoints. Whenever you > > > specify > > > more they start behaving very weirdly and Linux/Wireshark shows > > > the > > > EPROTO errors. In almost all cases that correlates with a kernel > > > crash. > > > > Ah, that makes more sense, so the device itself is just not > > responding > > to USB commands properly. > > > > > I've seen some EPROTO errors at the begining of a trace on > > > working > > > devices, I'm assuming it'd what happens during the period the > > > device is > > > plugged and probed and before it gets full ready (all EPs are > > > setup > > > and > > > able to respond). > > > > > > Now coming around the restriction I've been able to use 3 UARTs > > > without > > > a problem, also on another device from the same family. That > > > explains > > > why it did not work, does not explain why the kernel gets a CPU > > > stuck > > > in some busy loop/interrupt loop though. > > > > I agree, it's not good that we can hang with a broken device like > > this, > > but if the hardware is the thing that is locking up, it might be > > hard > > for us to do anything about this. Timing out and causing errors > > like > > you see might be the only thing that we can do. > > > > > From my side there's not much more I can do. I could try to get > > > this > > > setup in a simpler device and ship it someone willing to debug > > > it. > > > I'm > > > totally convinced this is a bug in the USB stack given it crashes > > > several computers with relateively different hardware. > > > > I would be interested to see what happens when you plug it into > > other > > operating systems. Do they also lock up, or can they handle > > devices > > like this? > > > > thanks, > > > > greg k-h > > Hey there, > > So I actually tested some Live media on the same computer and it is > interesting to see how other distros did not fail at all with the > same > kernel version as Fedora's :/ (I tested CloneZilla 5.7.0 kernel) > > So I dug a bit more and realized that the kernel wont go bananas if > ModemManager is not running. I'm assuming this means the kernel only > has issues whenever some software is poking the device (which sort of > makes sense, but at the same time should mean that the enumeration > and > config bits are 'ok'). > > Also running ModemManager after plugging the device results sometimes > in different behaviour. In some occasions I didnt get it to crash (or > at least not as reliably) but it would print: I don't think Fedora has yet switched to ModemManager's "strict" device probing policy (which only probes known modems based on USB IDs and driver types) so that might be the discrepancy. But I still wouldn't expect the kernel to crash or BUG, even when the device is opened and sent some traffic. Worst case a timeout. Dan > [12515.209323] xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq > Ptr > [12515.209324] xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is > pending. > > I'm not sure whether this is helpful or not. I haven't tried Windows > since I do not have any computer (and using VMs doesnt really seem to > work well). > > Thanks! > David > >