Hi Mathias, i got a NULL pointer deref in the usbfs and analyzed it. The connected device seemed to have trouble on USB transmissions. However, I have the impression that there is a race between finishing URB handling and disconnection of devices. In detail: [ 4979.029666] xhci_hcd 0000:00:15.0: USB transfer error. Maybe the USB cable is bad? [ 4979.029721] usb 1-3.2: usbfs: USBDEVFS_CONTROL failed cmd NmeDefaultPrio rqt 192 rq 51 len 2 ret -71 [ 4979.053224] usb 1-3.2: USB disconnect, device number 28 [ 4979.061672] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [ 4979.155537] IP: xhci_triad_to_transfer_ring+0x19/0x80 [xhci_hcd] [ 4979.155539] PGD 0 P4D 0 [ 4979.155543] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 4979.155547] Modules linked in: nls_cp437 nls_utf8 cfq_iosched sd_mod tntfs(PO) texfat(PO) usb_storage bcmdhd(O) squashfs zlib_inflate xz_dec lzo lzo_compress lzo_decompress ebtable_filter ebtables xfrm_user xfrm_algo cls_u32 sch_htb intel_tfm_governor intel_ipu4_psys intel_ipu4_psys_csslib crlmodule cdc_acm snd_soc_apl_mgu_hu ecryptfs dwc3 intel_xhci_usb_role_switch roles udc_core snd_soc_skl ti960 regmap_i2c adv728x sdw_cnl snd_soc_acpi_intel_match snd_soc_acpi intel_ipu4_isys videobuf2_dma_contig snd_soc_core coretemp videobuf2_memops ipu4_acpi intel_ipu4_isys_csslib videobuf2_v4l2 videobuf2_core snd_compress snd_soc_skl_ipc sbi_apl ahci i2c_i801 libahci sdw_bus crc8 snd_soc_sst_ipc libata snd_soc_sst_dsp cfg80211 intel_ipu4_mmu xhci_pci xhci_hcd snd_hda_ext_core snd_hda_core scsi_mod snd_pcm usbcore [ 4979.155599] rfkill usb_common mei_me snd_timer dwc3_pci snd intel_ipu4 mei soundcore iova nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse nft_exthdr 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm igb_avb(O) firmware_class ptp pps_core hwmon spi_pxa2xx_platform [last unloaded: bcmdhd] [ 4980.581941] CPU: 1 PID: 31853 Comm: NmeDefaultPrio Tainted: P U O 4.14.198-apl #1 [ 4980.683080] task: ffff9f6bf08e3200 task.stack: ffffafb5c04a4000 [ 4980.753930] RIP: 0010:xhci_triad_to_transfer_ring+0x19/0x80 [xhci_hcd] [ 4980.832160] RSP: 0018:ffffafb5c04a7a40 EFLAGS: 00010046 [ 4980.894757] RAX: 0000000000000000 RBX: ffff9f6b52744540 RCX: 0000000000000000 [ 4980.980169] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff9f6c1e9a6228 [ 4981.065584] RBP: ffffafb5c04a7a90 R08: 0000000000000009 R09: 0000000000000000 [ 4981.150995] R10: 0000000000000000 R11: ffff9f6aebd03000 R12: ffff9f6c1e9a6228 [ 4981.236409] R13: 0000000000000000 R14: 0000000001080020 R15: 0000000000000009 [ 4981.321926] FS: 00007ff8010b3700(0000) GS:ffff9f6c37c80000(0000) knlGS:0000000000000000 [ 4981.418896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4981.487644] CR2: 0000000000000030 CR3: 000000024a312000 CR4: 00000000003406a0 [ 4981.573067] Call Trace: [ 4981.602251] ? xhci_queue_ctrl_tx+0x48/0x270 [xhci_hcd] [ 4981.664868] ? __kmalloc+0x176/0x1f0 [ 4981.707679] xhci_urb_enqueue+0x232/0x590 [xhci_hcd] [ 4981.767160] usb_hcd_submit_urb+0x98/0xbc0 [usbcore] [ 4981.826651] ? __pm_runtime_resume+0x47/0x60 [ 4981.877804] ? usbdev_open+0x1eb/0x230 [usbcore] [ 4981.933122] usb_submit_urb+0x2a3/0x5b0 [usbcore] [ 4981.989493] usb_start_wait_urb+0x69/0x170 [usbcore] [ 4982.048981] usb_control_msg+0xd7/0x140 [usbcore] [ 4982.105352] usbdev_ioctl+0x1d61/0x2570 [usbcore] [ 4982.161723] ? do_filp_open+0xaa/0x110 [ 4982.206632] do_vfs_ioctl+0xa3/0x610 [ 4982.249440] ? syscall_slow_exit_work+0x13a/0x1a0 [ 4982.305793] ? __fget+0x71/0xa0 [ 4982.343397] SyS_ioctl+0x74/0x80 [ 4982.382041] do_syscall_64+0x79/0x350 [ 4982.425896] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 4982.486414] RIP: 0033:0x7ff81f1fb4a7 [ 4982.529222] RSP: 002b:00007ff8010b24e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 4982.619946] RAX: ffffffffffffffda RBX: 00007ff810000e80 RCX: 00007ff81f1fb4a7 [ 4982.705358] RDX: 00007ff8010b24f0 RSI: 00000000c0185500 RDI: 0000000000000061 [ 4982.790769] RBP: 0000000000000061 R08: 0000000000000300 R09: 0000000000000000 [ 4982.876179] R10: 0000000000000080 R11: 0000000000000246 R12: 0000000000000007 [ 4982.961599] R13: 00007ff80c024648 R14: 00007ff80c024648 R15: 00007ff80c018d10 [ 4983.047033] Code: e5 09 d1 48 8d 04 b0 89 08 5d c3 c3 66 0f 1f 44 00 00 89 d0 41 89 f0 4c 8d 0c 80 4a 8d 04 48 48 c1 e0 04 4a 03 84 c7 90 01 00 00 <f6> 40 30 10 74 20 55 85 c9 48 89 e5 74 3e 48 8b 40 20 44 8b 40 [ 4983.273287] RIP: xhci_triad_to_transfer_ring+0x19/0x80 [xhci_hcd] RSP: ffffafb5c04a7a40 [ 4983.369231] CR2: 0000000000000030 The crash happens here: 000000000000d8c0 <xhci_triad_to_transfer_ring>: xhci_triad_to_transfer_ring(): d8c0:89 d0 mov eax,edx d8c2:41 89 f0 mov r8d,esi d8c5:4c 8d 0c 80 lea r9,[rax+rax*4] d8c9:4a 8d 04 48 lea rax,[rax+r9*2] d8cd:48 c1 e0 04 shl rax,0x4 d8d1:4a 03 84 c7 90 01 00 add rax,QWORD PTR [rdi+r8*8+0x190] d8d8:00 RAX holds *ep, which is NULL: d8d9:f6 40 30 10 test BYTE PTR [rax+0x30],0x10 // if (!(ep->ep_state & EP_HAS_STREAMS)) //********* CRASH RDI: ffff9f6c1e9a6228 RAX: 0000000000000000 ep R08: 0000000000000009 ep_index R09: 0000000000000000 slot_id In sources this is (xhci_ring.c, 451ff): struct xhci_ring *xhci_triad_to_transfer_ring(struct xhci_hcd *xhci, unsigned int slot_id, unsigned int ep_index, unsigned int stream_id) { struct xhci_virt_ep *ep; ep = &xhci->devs[slot_id]->eps[ep_index]; /* Common case: no streams */ if (!(ep->ep_state & EP_HAS_STREAMS)) //********* CRASH return ep->ring; When looking into the dmesg, we see that there is a disconnect happening short before the crash: [ 4979.053224] usb 1-3.2: USB disconnect, device number 28 [ 4979.061672] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 The disconnect message is thrown here (hub.c, 2137ff): void usb_disconnect(struct usb_device **pdev) { struct usb_port *port_dev = NULL; struct usb_device *udev = *pdev; struct usb_hub *hub = NULL; int port1 = 1; /* mark the device as inactive, so any further urb submissions for * this device (and any of its children) will fail immediately. * this quiesces everything except pending urbs. */ usb_set_device_state(udev, USB_STATE_NOTATTACHED); dev_info(&udev->dev, "USB disconnect, device number %d\n", udev->devnum); /* * Ensure that the pm runtime code knows that the USB device * is in the process of being disconnected. */ pm_runtime_barrier(&udev->dev); usb_lock_device(udev); ... and at a later point in this function: usb_remove_ep_devs(&udev->ep0); usb_unlock_device(udev); My assumption is that the transfer_ring is not locked against a disconnect, and doesn't use the device lock. A first idea would be to add a NULL pointer check for *ep in xhci_triad_to_transfer_ring like ep = &xhci->devs[slot_id]->eps[ep_index]; if (!ep) return NULL; but that would cure the symptom, not the root cause. Any idea for a different approach to solve that race? Best regards Carsten ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955