On Wed, 8 Nov 2017, Philipp Kern wrote: > Hi, > > I encountered a bunch of NULL pointer dereferences with the same stack > trace: > > [ 25.774914] usb 3-2: USB disconnect, device number 3 > [ 30.769840] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 30.769846] IP: [<ffffffff8161b125>] usb_ifnum_to_if+0x25/0x60 > [ 30.769851] PGD 0 > [ 30.769852] Oops: 0000 [#1] SMP > [ 30.769854] Modules linked in: nvidia_uvm(POE) nvidia(POE) > snd_hda_codec_hdmi btusb btrtl btbcm ppdev btintel joydev bnep > intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp uvcvideo > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core > v4l2_common kvm videodev ftdi_sio media snd_usb_audio bluetooth > usbserial irqbypass snd_usbmidi_lib crct10dif_pclmul crc32_pclmul > snd_rawmidi ghash_clmulni_intel snd_seq_device serio_raw aesni_intel > hid_multitouch aes_x86_64 lrw snd_hda_codec_realtek gf128mul glue_helper > ablk_helper cryptd snd_hda_codec_generic drm lpc_ich snd_hda_intel > snd_hda_codec snd_hda_core snd_hwdep snd_pcm parport_pc parport > 8250_fintek snd_timer snd wmi soundcore mei_me shpchp udlfb(OE) mei > mac_hid fb_sys_fops syscopyarea sysfillrect sysimgblt hid_sis(OE) > hid_mml(OE) 99xx(OE) hid_generic usbhid hid e1000e psmouse ahci libahci > ptp pps_core fjes video > [ 30.769890] CPU: 1 PID: 1597 Comm: [redacted] Tainted: P OE > 4.4.0-97-generic #120-Ubuntu > [ 30.769893] Hardware name: LENOVO 30AJS07000/SHARKBAY, BIOS FBKTB6AUS > 08/02/2015 > [ 30.769894] task: ffff880407beaa00 ti: ffff880405758000 task.ti: > ffff880405758000 > [ 30.769896] RIP: 0010:[<ffffffff8161b125>] [<ffffffff8161b125>] > usb_ifnum_to_if+0x25/0x60 > [ 30.769900] RSP: 0018:ffff88040575bab0 EFLAGS: 00010202 > [ 30.769902] RAX: 0000000000000000 RBX: ffff88040b19b000 RCX: > ffff8804049ab000 > [ 30.769904] RDX: ffff8804055968f8 RSI: 0000000000000001 RDI: > 0000000000000004 > [ 30.769906] RBP: ffff88040575bab0 R08: 00000000ffffff92 R09: > 00000001820001bc > [ 30.769908] R10: ffff88040b1b5958 R11: 0000000000000000 R12: > ffff8804049aec00 > [ 30.769910] R13: 0000000000000000 R14: 0000000000000000 R15: > ffff88040b0a6000 > [ 30.769912] FS: 00007f7ba7fff700(0000) GS:ffff88041dc40000(0000) > knlGS:0000000000000000 > [ 30.769915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 30.769917] CR2: 0000000000000000 CR3: 0000000404a82000 CR4: > 00000000001406e0 > [ 30.769919] Stack: > [ 30.769921] ffff88040575bb08 ffffffff81627a6e ffff88040b19b000 > ffff880405596808 > [ 30.769925] 0000000000000001 ffff8804055968f8 ffff88040b19b000 > ffff8804049aec00 > [ 30.769928] ffff8804055968f8 0000000000000000 ffff88040b0a6000 > ffff88040575bb60 > [ 30.769932] Call Trace: > [ 30.769936] [<ffffffff81627a6e>] usb_hcd_alloc_bandwidth+0x1fe/0x310 > [ 30.769938] [<ffffffff8162ac1d>] usb_set_interface+0xfd/0x390 > [ 30.769943] [<ffffffffc07f60c2>] uvc_init_video+0x152/0x4c0 [uvcvideo] > [ 30.769947] [<ffffffffc07f8203>] uvc_video_enable+0xf3/0x180 [uvcvideo] > [ 30.769950] [<ffffffffc07f32af>] uvc_start_streaming+0x2f/0x70 > [uvcvideo] > [ 30.769953] [<ffffffffc0693c00>] vb2_start_streaming+0x70/0x180 > [videobuf2_core] > [ 30.769957] [<ffffffffc0694268>] vb2_core_streamon+0xf8/0x130 > [videobuf2_core] > [ 30.769961] [<ffffffffc0809e09>] vb2_streamon+0x29/0x50 > [videobuf2_v4l2] > [ 30.769964] [<ffffffffc07f370e>] uvc_queue_streamon+0x2e/0x50 > [uvcvideo] > [ 30.769967] [<ffffffffc07f45fc>] uvc_ioctl_streamon+0x3c/0x60 > [uvcvideo] > [ 30.769973] [<ffffffffc083498a>] v4l_streamon+0x1a/0x20 [videodev] > [ 30.769978] [<ffffffffc08382c1>] __video_do_ioctl+0x291/0x310 > [videodev] > [ 30.769982] [<ffffffffc0837d96>] video_usercopy+0x336/0x5b0 [videodev] > [ 30.769986] [<ffffffffc0838030>] ? video_ioctl2+0x20/0x20 [videodev] > [ 30.769989] [<ffffffff811c26a7>] ? handle_mm_fault+0x1277/0x1820 > [ 30.769994] [<ffffffffc0838025>] video_ioctl2+0x15/0x20 [videodev] > [ 30.769997] [<ffffffffc08336a3>] v4l2_ioctl+0xd3/0xe0 [videodev] > [ 30.770000] [<ffffffff81223fbf>] do_vfs_ioctl+0x29f/0x490 > [ 30.770004] [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400 > [ 30.770006] [<ffffffff81224229>] SyS_ioctl+0x79/0x90 > [ 30.770010] [<ffffffff818437f2>] entry_SYSCALL_64_fastpath+0x16/0x71 > [ 30.770012] Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 98 03 00 > 00 48 89 e5 48 85 c9 74 46 0f b6 79 04 40 84 ff 74 3d 48 8b 81 98 00 00 > 00 <48> 8b 10 0f b6 52 02 39 d6 74 2d 8d 47 ff 48 8d 91 a0 00 00 00 > [ 30.770030] RIP [<ffffffff8161b125>] usb_ifnum_to_if+0x25/0x60 > [ 30.770033] RSP <ffff88040575bab0> > [ 30.770035] CR2: 0000000000000000 > [ 30.770037] ---[ end trace 4279e1387d74d705 ]--- > > In all cases the Logitech Webcam C930e drops from the bus and then the > UVC access breaks. The code looks like this and is unchanged (even the > paths leading up to it) in mainline: Was 3-2 the Logitech webcam device? Is that the device the uvc driver was trying to access when the NULL pointer dereference occurred? > struct usb_interface *usb_ifnum_to_if(const struct usb_device *dev, > unsigned ifnum) > { > struct usb_host_config *config = dev->actconfig; > int i; > > if (!config) > return NULL; > for (i = 0; i < config->desc.bNumInterfaces; i++) > if (config->interface[i]->altsetting[0] > .desc.bInterfaceNumber == ifnum) > return config->interface[i]; > > return NULL; > } > > I tracked it down to config->interface[i]->altsetting[0] being NULL and You're slightly confused. config->interface[i]->altsetting is a pointer to an array of structures, and config->interface[i]->altsetting[0] is the first structure in that array. Since it's a structure and not a pointer, it can't possibly be NULL. Perhaps you meant that config->interface[i]->altsetting is NULL. Or maybe config->interface[i] is NULL. > the unguarded access of desc failing. Curiously enough when invoked from > usb_set_interface it's still valid (as usb_ifnum_to_if is called at its > top). Now my question is: Assuming this is some kind of race > (usb_set_interface takes the hcd->bandwidth_mutex) with the removal: Are you asking whether the bad access at timestamp 30.769840 raced with the device removal at timestamp 25.774914? Offhand I'd say that this is most unlikely, since there is a 5-second interval between the two events. > is > altsetting[0] supposed to be guaranteed to be always non-NULL? interface[i]->altsetting is indeed guaranteed always to be non-NULL. In fact, I don't think it ever gets assigned to after it has been initialized. > usb_hcd_alloc_bandwidth seems to intend to manipulate it. No, it isn't. I don't see why you would say that. > If not, is the > appropriate fix here to just insert a NULL check against altsetting[0] > and continue the loop? No. If the bad access occurred because the uvc driver tried to access a device that had been removed 5 seconds earlier, the most likely explanation is a reference-counting error of some sort. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html