Possible race in 4.14 xhci stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mathias,

i got a NULL pointer deref in the usbfs and analyzed it.
The connected device seemed to have trouble on USB transmissions.
However, I have the impression that there is a race between finishing URB handling and disconnection of devices.
In detail:
[ 4979.029666] xhci_hcd 0000:00:15.0: USB transfer error. Maybe the USB cable is bad?
[ 4979.029721] usb 1-3.2: usbfs: USBDEVFS_CONTROL failed cmd NmeDefaultPrio rqt 192 rq 51 len 2 ret -71
[ 4979.053224] usb 1-3.2: USB disconnect, device number 28
[ 4979.061672] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 4979.155537] IP: xhci_triad_to_transfer_ring+0x19/0x80 [xhci_hcd]
[ 4979.155539] PGD 0 P4D 0
[ 4979.155543] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 4979.155547] Modules linked in: nls_cp437 nls_utf8 cfq_iosched sd_mod tntfs(PO) texfat(PO) usb_storage bcmdhd(O) squashfs zlib_inflate xz_dec lzo lzo_compress lzo_decompress ebtable_filter ebtables xfrm_user xfrm_algo cls_u32 sch_htb intel_tfm_governor intel_ipu4_psys intel_ipu4_psys_csslib crlmodule cdc_acm snd_soc_apl_mgu_hu ecryptfs dwc3 intel_xhci_usb_role_switch roles udc_core snd_soc_skl ti960 regmap_i2c adv728x sdw_cnl snd_soc_acpi_intel_match snd_soc_acpi intel_ipu4_isys videobuf2_dma_contig snd_soc_core coretemp videobuf2_memops ipu4_acpi intel_ipu4_isys_csslib videobuf2_v4l2 videobuf2_core snd_compress snd_soc_skl_ipc sbi_apl ahci i2c_i801 libahci sdw_bus crc8 snd_soc_sst_ipc libata snd_soc_sst_dsp cfg80211 intel_ipu4_mmu xhci_pci xhci_hcd snd_hda_ext_core snd_hda_core scsi_mod snd_pcm usbcore
[ 4979.155599]  rfkill usb_common mei_me snd_timer dwc3_pci snd intel_ipu4 mei soundcore iova nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse nft_exthdr 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm igb_avb(O) firmware_class ptp pps_core hwmon spi_pxa2xx_platform [last unloaded: bcmdhd]
[ 4980.581941] CPU: 1 PID: 31853 Comm: NmeDefaultPrio Tainted: P     U     O    4.14.198-apl #1
[ 4980.683080] task: ffff9f6bf08e3200 task.stack: ffffafb5c04a4000
[ 4980.753930] RIP: 0010:xhci_triad_to_transfer_ring+0x19/0x80 [xhci_hcd]
[ 4980.832160] RSP: 0018:ffffafb5c04a7a40 EFLAGS: 00010046
[ 4980.894757] RAX: 0000000000000000 RBX: ffff9f6b52744540 RCX: 0000000000000000
[ 4980.980169] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff9f6c1e9a6228
[ 4981.065584] RBP: ffffafb5c04a7a90 R08: 0000000000000009 R09: 0000000000000000
[ 4981.150995] R10: 0000000000000000 R11: ffff9f6aebd03000 R12: ffff9f6c1e9a6228
[ 4981.236409] R13: 0000000000000000 R14: 0000000001080020 R15: 0000000000000009
[ 4981.321926] FS:  00007ff8010b3700(0000) GS:ffff9f6c37c80000(0000) knlGS:0000000000000000
[ 4981.418896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4981.487644] CR2: 0000000000000030 CR3: 000000024a312000 CR4: 00000000003406a0
[ 4981.573067] Call Trace:
[ 4981.602251]  ? xhci_queue_ctrl_tx+0x48/0x270 [xhci_hcd]
[ 4981.664868]  ? __kmalloc+0x176/0x1f0
[ 4981.707679]  xhci_urb_enqueue+0x232/0x590 [xhci_hcd]
[ 4981.767160]  usb_hcd_submit_urb+0x98/0xbc0 [usbcore]
[ 4981.826651]  ? __pm_runtime_resume+0x47/0x60
[ 4981.877804]  ? usbdev_open+0x1eb/0x230 [usbcore]
[ 4981.933122]  usb_submit_urb+0x2a3/0x5b0 [usbcore]
[ 4981.989493]  usb_start_wait_urb+0x69/0x170 [usbcore]
[ 4982.048981]  usb_control_msg+0xd7/0x140 [usbcore]
[ 4982.105352]  usbdev_ioctl+0x1d61/0x2570 [usbcore]
[ 4982.161723]  ? do_filp_open+0xaa/0x110
[ 4982.206632]  do_vfs_ioctl+0xa3/0x610
[ 4982.249440]  ? syscall_slow_exit_work+0x13a/0x1a0
[ 4982.305793]  ? __fget+0x71/0xa0
[ 4982.343397]  SyS_ioctl+0x74/0x80
[ 4982.382041]  do_syscall_64+0x79/0x350
[ 4982.425896]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 4982.486414] RIP: 0033:0x7ff81f1fb4a7
[ 4982.529222] RSP: 002b:00007ff8010b24e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4982.619946] RAX: ffffffffffffffda RBX: 00007ff810000e80 RCX: 00007ff81f1fb4a7
[ 4982.705358] RDX: 00007ff8010b24f0 RSI: 00000000c0185500 RDI: 0000000000000061
[ 4982.790769] RBP: 0000000000000061 R08: 0000000000000300 R09: 0000000000000000
[ 4982.876179] R10: 0000000000000080 R11: 0000000000000246 R12: 0000000000000007
[ 4982.961599] R13: 00007ff80c024648 R14: 00007ff80c024648 R15: 00007ff80c018d10
[ 4983.047033] Code: e5 09 d1 48 8d 04 b0 89 08 5d c3 c3 66 0f 1f 44 00 00 89 d0 41 89 f0 4c 8d 0c 80 4a 8d 04 48 48 c1 e0 04 4a 03 84 c7 90 01 00 00 <f6> 40 30 10 74 20 55 85 c9 48 89 e5 74 3e 48 8b 40 20 44 8b 40
[ 4983.273287] RIP: xhci_triad_to_transfer_ring+0x19/0x80 [xhci_hcd] RSP: ffffafb5c04a7a40
[ 4983.369231] CR2: 0000000000000030

The crash happens here:
000000000000d8c0 <xhci_triad_to_transfer_ring>:
xhci_triad_to_transfer_ring():
    d8c0:89 d0                mov    eax,edx
    d8c2:41 89 f0             mov    r8d,esi
    d8c5:4c 8d 0c 80          lea    r9,[rax+rax*4]
    d8c9:4a 8d 04 48          lea    rax,[rax+r9*2]
    d8cd:48 c1 e0 04          shl    rax,0x4
    d8d1:4a 03 84 c7 90 01 00 add    rax,QWORD PTR [rdi+r8*8+0x190]
    d8d8:00

RAX holds *ep, which is NULL:
    d8d9:f6 40 30 10          test   BYTE PTR [rax+0x30],0x10 // if (!(ep->ep_state & EP_HAS_STREAMS)) //********* CRASH

RDI: ffff9f6c1e9a6228
RAX: 0000000000000000 ep
R08: 0000000000000009 ep_index
R09: 0000000000000000 slot_id

In sources this is (xhci_ring.c, 451ff):
struct xhci_ring *xhci_triad_to_transfer_ring(struct xhci_hcd *xhci,
unsigned int slot_id, unsigned int ep_index,
unsigned int stream_id)
{
struct xhci_virt_ep *ep;

ep = &xhci->devs[slot_id]->eps[ep_index];
/* Common case: no streams */

if (!(ep->ep_state & EP_HAS_STREAMS)) //********* CRASH

return ep->ring;

When looking into the dmesg, we see that there is a disconnect happening short before the crash:
[ 4979.053224] usb 1-3.2: USB disconnect, device number 28
[ 4979.061672] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030

The disconnect message is thrown here (hub.c, 2137ff):
void usb_disconnect(struct usb_device **pdev)
{
struct usb_port *port_dev = NULL;
struct usb_device *udev = *pdev;
struct usb_hub *hub = NULL;
int port1 = 1;

/* mark the device as inactive, so any further urb submissions for
 * this device (and any of its children) will fail immediately.
 * this quiesces everything except pending urbs.
 */
usb_set_device_state(udev, USB_STATE_NOTATTACHED);
dev_info(&udev->dev, "USB disconnect, device number %d\n",
udev->devnum);

/*
 * Ensure that the pm runtime code knows that the USB device
 * is in the process of being disconnected.
 */
pm_runtime_barrier(&udev->dev);

usb_lock_device(udev);
... and at a later point in this function:
usb_remove_ep_devs(&udev->ep0);
usb_unlock_device(udev);

My assumption is that the transfer_ring is not locked against a disconnect, and doesn't use the device lock.

A first idea would be to add a NULL pointer check for *ep in xhci_triad_to_transfer_ring like
ep = &xhci->devs[slot_id]->eps[ep_index];
if (!ep)
return NULL;

but that would cure the symptom, not the root cause.
Any idea for a different approach to solve that race?

Best regards
Carsten
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux