Hi Alan,
first of all thanks for your reply. My apologies for trying to condense
too much and trying to read an unfamiliar codebase wrongly. :/
On 2017-11-08 17:08, Alan Stern wrote:
On Wed, 8 Nov 2017, Philipp Kern wrote:
Hi,
I encountered a bunch of NULL pointer dereferences with the same stack
trace:
[ 25.774914] usb 3-2: USB disconnect, device number 3
[ 30.769840] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 30.769846] IP: [<ffffffff8161b125>] usb_ifnum_to_if+0x25/0x60
[ 30.769851] PGD 0
[ 30.769852] Oops: 0000 [#1] SMP
[ 30.769854] Modules linked in: nvidia_uvm(POE) nvidia(POE)
snd_hda_codec_hdmi btusb btrtl btbcm ppdev btintel joydev bnep
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
v4l2_common kvm videodev ftdi_sio media snd_usb_audio bluetooth
usbserial irqbypass snd_usbmidi_lib crct10dif_pclmul crc32_pclmul
snd_rawmidi ghash_clmulni_intel snd_seq_device serio_raw aesni_intel
hid_multitouch aes_x86_64 lrw snd_hda_codec_realtek gf128mul
glue_helper
ablk_helper cryptd snd_hda_codec_generic drm lpc_ich snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep snd_pcm parport_pc parport
8250_fintek snd_timer snd wmi soundcore mei_me shpchp udlfb(OE) mei
mac_hid fb_sys_fops syscopyarea sysfillrect sysimgblt hid_sis(OE)
hid_mml(OE) 99xx(OE) hid_generic usbhid hid e1000e psmouse ahci
libahci
ptp pps_core fjes video
[ 30.769890] CPU: 1 PID: 1597 Comm: [redacted] Tainted: P OE
4.4.0-97-generic #120-Ubuntu
[ 30.769893] Hardware name: LENOVO 30AJS07000/SHARKBAY, BIOS FBKTB6AUS
08/02/2015
[ 30.769894] task: ffff880407beaa00 ti: ffff880405758000 task.ti:
ffff880405758000
[ 30.769896] RIP: 0010:[<ffffffff8161b125>] [<ffffffff8161b125>]
usb_ifnum_to_if+0x25/0x60
[ 30.769900] RSP: 0018:ffff88040575bab0 EFLAGS: 00010202
[ 30.769902] RAX: 0000000000000000 RBX: ffff88040b19b000 RCX:
ffff8804049ab000
[ 30.769904] RDX: ffff8804055968f8 RSI: 0000000000000001 RDI:
0000000000000004
[ 30.769906] RBP: ffff88040575bab0 R08: 00000000ffffff92 R09:
00000001820001bc
[ 30.769908] R10: ffff88040b1b5958 R11: 0000000000000000 R12:
ffff8804049aec00
[ 30.769910] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88040b0a6000
[ 30.769912] FS: 00007f7ba7fff700(0000) GS:ffff88041dc40000(0000)
knlGS:0000000000000000
[ 30.769915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 30.769917] CR2: 0000000000000000 CR3: 0000000404a82000 CR4:
00000000001406e0
[ 30.769919] Stack:
[ 30.769921] ffff88040575bb08 ffffffff81627a6e ffff88040b19b000
ffff880405596808
[ 30.769925] 0000000000000001 ffff8804055968f8 ffff88040b19b000
ffff8804049aec00
[ 30.769928] ffff8804055968f8 0000000000000000 ffff88040b0a6000
ffff88040575bb60
[ 30.769932] Call Trace:
[ 30.769936] [<ffffffff81627a6e>] usb_hcd_alloc_bandwidth+0x1fe/0x310
[ 30.769938] [<ffffffff8162ac1d>] usb_set_interface+0xfd/0x390
[ 30.769943] [<ffffffffc07f60c2>] uvc_init_video+0x152/0x4c0
[uvcvideo]
[ 30.769947] [<ffffffffc07f8203>] uvc_video_enable+0xf3/0x180
[uvcvideo]
[ 30.769950] [<ffffffffc07f32af>] uvc_start_streaming+0x2f/0x70
[uvcvideo]
[ 30.769953] [<ffffffffc0693c00>] vb2_start_streaming+0x70/0x180
[videobuf2_core]
[ 30.769957] [<ffffffffc0694268>] vb2_core_streamon+0xf8/0x130
[videobuf2_core]
[ 30.769961] [<ffffffffc0809e09>] vb2_streamon+0x29/0x50
[videobuf2_v4l2]
[ 30.769964] [<ffffffffc07f370e>] uvc_queue_streamon+0x2e/0x50
[uvcvideo]
[ 30.769967] [<ffffffffc07f45fc>] uvc_ioctl_streamon+0x3c/0x60
[uvcvideo]
[ 30.769973] [<ffffffffc083498a>] v4l_streamon+0x1a/0x20 [videodev]
[ 30.769978] [<ffffffffc08382c1>] __video_do_ioctl+0x291/0x310
[videodev]
[ 30.769982] [<ffffffffc0837d96>] video_usercopy+0x336/0x5b0
[videodev]
[ 30.769986] [<ffffffffc0838030>] ? video_ioctl2+0x20/0x20 [videodev]
[ 30.769989] [<ffffffff811c26a7>] ? handle_mm_fault+0x1277/0x1820
[ 30.769994] [<ffffffffc0838025>] video_ioctl2+0x15/0x20 [videodev]
[ 30.769997] [<ffffffffc08336a3>] v4l2_ioctl+0xd3/0xe0 [videodev]
[ 30.770000] [<ffffffff81223fbf>] do_vfs_ioctl+0x29f/0x490
[ 30.770004] [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
[ 30.770006] [<ffffffff81224229>] SyS_ioctl+0x79/0x90
[ 30.770010] [<ffffffff818437f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 30.770012] Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 98 03
00
00 48 89 e5 48 85 c9 74 46 0f b6 79 04 40 84 ff 74 3d 48 8b 81 98 00
00
00 <48> 8b 10 0f b6 52 02 39 d6 74 2d 8d 47 ff 48 8d 91 a0 00 00 00
[ 30.770030] RIP [<ffffffff8161b125>] usb_ifnum_to_if+0x25/0x60
[ 30.770033] RSP <ffff88040575bab0>
[ 30.770035] CR2: 0000000000000000
[ 30.770037] ---[ end trace 4279e1387d74d705 ]---
In all cases the Logitech Webcam C930e drops from the bus and then the
UVC access breaks. The code looks like this and is unchanged (even the
paths leading up to it) in mainline:
Was 3-2 the Logitech webcam device? Is that the device the uvc driver
was trying to access when the NULL pointer dereference occurred?
Yes.
[ 1.273210] usb 3-2: new high-speed USB device number 3 using
xhci_hcd
[ 2.587518] usb 3-2: New USB device found, idVendor=046d,
idProduct=0843
[ 2.587520] usb 3-2: New USB device strings: Mfr=0, Product=2,
SerialNumber=1
[ 2.587522] usb 3-2: Product: Logitech Webcam C930e
[ 2.587523] usb 3-2: SerialNumber: 167F383E
It is also the only webcam/UVC device that was attached.
struct usb_interface *usb_ifnum_to_if(const struct usb_device *dev,
unsigned ifnum)
{
struct usb_host_config *config = dev->actconfig;
int i;
if (!config)
return NULL;
for (i = 0; i < config->desc.bNumInterfaces; i++)
if (config->interface[i]->altsetting[0]
.desc.bInterfaceNumber == ifnum)
return config->interface[i];
return NULL;
}
I tracked it down to config->interface[i]->altsetting[0] being NULL
and
You're slightly confused. config->interface[i]->altsetting is a
pointer to an array of structures, and
config->interface[i]->altsetting[0] is the first structure in that
array. Since it's a structure and not a pointer, it can't possibly be
NULL. Perhaps you meant that config->interface[i]->altsetting is NULL.
Or maybe config->interface[i] is NULL.
You are right. I got lost in between (mostly also because altsetting is
the first member of that structure). The assembly:
6: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
b: 55 push %rbp
c: 48 8b 8f 98 03 00 00 mov 0x398(%rdi),%rcx
13: 48 89 e5 mov %rsp,%rbp
16: 48 85 c9 test %rcx,%rcx
19: 74 46 je 0x61
1b: 0f b6 79 04 movzbl 0x4(%rcx),%edi
1f: 40 84 ff test %dil,%dil
22: 74 3d je 0x61
24: 48 8b 81 98 00 00 00 mov 0x98(%rcx),%rax
2b:* 48 8b 10 mov (%rax),%rdx <--
trapping instruction
2e: 0f b6 52 02 movzbl 0x2(%rdx),%edx
32: 39 d6 cmp %edx,%esi
34: 74 2d je 0x63
36: 8d 47 ff lea -0x1(%rdi),%eax
39: 48 8d 91 a0 00 00 00 lea 0xa0(%rcx),%rdx
config->interface[0] is 0x98 into the struct usb_host_config. So I guess
it's actually config->interface[0] that is NULL? And then
desc.bInterfaceNumber is 2 into the target of altsetting([0]).
the unguarded access of desc failing. Curiously enough when invoked
from
usb_set_interface it's still valid (as usb_ifnum_to_if is called at
its
top). Now my question is: Assuming this is some kind of race
(usb_set_interface takes the hcd->bandwidth_mutex) with the removal:
Are you asking whether the bad access at timestamp 30.769840 raced with
the device removal at timestamp 25.774914? Offhand I'd say that this
is most unlikely, since there is a 5-second interval between the two
events.
I would have hoped for this answer, I guess. But the fact that it's
preceded by a device removal worried me.
is
altsetting[0] supposed to be guaranteed to be always non-NULL?
interface[i]->altsetting is indeed guaranteed always to be non-NULL.
In fact, I don't think it ever gets assigned to after it has been
initialized.
usb_hcd_alloc_bandwidth seems to intend to manipulate it.
No, it isn't. I don't see why you would say that.
You are right. It's just endpoint management, actually.
If not, is the
appropriate fix here to just insert a NULL check against altsetting[0]
and continue the loop?
No.
If the bad access occurred because the uvc driver tried to access a
device that had been removed 5 seconds earlier, the most likely
explanation is a reference-counting error of some sort.
That sounds reasonable. Is there a way to debug this? RTFM is a fine
response as well.
Kind regards and thanks
Philipp Kern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html