On Fri, Jul 13, 2018 at 10:51:27AM +0200, Benjamin Tissoires wrote: > On Fri, Jul 13, 2018 at 10:03 AM Dmitry Torokhov > <dmitry.torokhov@xxxxxxxxx> wrote: > > > > On July 12, 2018 12:36:53 PM PDT, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > >Dmitry, > > > I've had my X setup lock up a couple of times now, so I enabled > > >pretty much every single debug option in the kernel, and tried to see > > >if I could trigger anything. So I turned on list debugging, > > >DEBUG_PAGEALLOC, kobject debugging, lockdep and sleep debugging etc. > > > > > >What I *did* trigger with all those options was a dead mouse. I'm not > > >sure which debug option causes the dead mouse for me, but it's likely > > >one of the DEBUG_KOBJECT_RELEASE. I'll try to figure that out more. > > > > > >And trying to unplug and re-plug the mouse, I got this: > > > > > > general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC PTI > > > CPU: 3 PID: 147 Comm: kworker/3:1 Not tainted > > >4.18.0-rc4-00071-g63f047771621 #2 > > > Hardware name: System manufacturer System Product Name/Z170-K, BIOS > > >3301 02/08/2017 > > > Workqueue: usb_hub_wq hub_event > > > RIP: 0010:strlen+0x0/0x20 > > > Code: f6 82 40 63 ef 9d 20 74 14 48 c7 c1 40 63 ef 9d 48 83 c0 01 0f > > >b6 10 f6 04 11 20 75 f3 f3 c3 90 66 2e 0f 1f 84 00 00 00 00 00 <80> 3f > > >00 48 89 f8 74 10 48 83 c7 01 80 3f > > > RSP: 0018:ffff91fc42eff840 EFLAGS: 00010286 > > > RAX: 0000000000000007 RBX: 000000000000001a RCX: 0000000000000000 > > > RDX: 0000000000000000 RSI: 00000000006000c0 RDI: dead4ead00000000 > > > RBP: ffff889396a386f8 R08: 00000000919b9af4 R09: 0000000000000001 > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff889396b36688 > > > R13: 00000000006000c0 R14: 00000000fffffffe R15: ffff8893a2717a98 > > >FS: 0000000000000000(0000) GS:ffff8893b4ec0000(0000) > > >knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033aminamin.tissoires > > > CR2: 00007f896b883940 CR3: 0000000202210003 CR4: 00000000003606e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Call Trace: > > > kobject_get_path+0x1d/0xe0 > > > kobject_uevent_env+0xff/0x660 > > > led_trigger_set+0x106/0x250 > > > led_classdev_unregister+0x2d/0xc0 > > > input_leds_disconnect+0x38/0x70 > > > __input_unregister_device+0xac/0x170 > > > input_unregister_device+0x41/0x60 > > > hidinput_disconnect+0x75/0xe0 > > > hid_disconnect+0x5f/0x70 > > > hid_hw_stop+0xe/0x30 > > > hidpp_remove+0x39/0xd0 [hid_logitech_hidpp] > > > hid_device_remove+0x4f/0xb0 > > > device_release_driver_internal+0x196/0x260 > > > bus_remove_device+0xff/0x170 > > > device_del+0x147/0x370 > > > hid_destroy_device+0x22/0x60 > > > logi_dj_remove+0x52/0xc0 [hid_logitech_dj] > > > hid_device_remove+0x4f/0xb0 > > > device_release_driver_internal+0x196/0x260 > > > bus_remove_device+0xff/0x170 > > > device_del+0x147/0x370 > > > hid_destroy_device+0x22/0x60 > > > usbhid_disconnect+0x43/0x60 > > > usb_unbind_interface+0x7c/0x2a0 > > > device_release_driver_internal+0x196/0x260 > > > bus_remove_device+0xff/0x170 > > > device_del+0x147/0x370 > > > usb_disable_device+0x96/0x260 > > > usb_disconnect+0xbe/0x2a0 > > > hub_event+0x553/0x1620 > > > process_one_work+0x272/0x6b0 > > > worker_thread+0x3c/0x390 > > > kthread+0x11e/0x140 > > > ret_from_fork+0x3a/0x50 > > > Modules linked in: rfcomm fuse xt_CHECKSUM ipt_MASQUERADE tun > > >nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter > > >ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set > > > ftdi_sio snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel > > >snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm > > >mei_me snd_timer snd mei i2c_i801 soundc > > > ---[ end trace 797ae088d6f38ed9 ]--- > > > RIP: 0010:strlen+0x0/0x20 > > > Code: f6 82 40 63 ef 9d 20 74 14 48 c7 c1 40 63 ef 9d 48 83 c0 01 0f > > >b6 10 f6 04 11 20 75 f3 f3 c3 90 66 2e 0f 1f 84 00 00 00 00 00 <80> 3f > > >00 48 89 f8 74 10 48 83 c7 01 80 3f > > > RSP: 0018:ffff91fc42eff840 EFLAGS: 00010286 > > > RAX: 0000000000000007 RBX: 000000000000001a RCX: 0000000000000000 > > > RDX: 0000000000000000 RSI: 00000000006000c0 RDI: dead4ead00000000 > > > RBP: ffff889396a386f8 R08: 00000000919b9af4 R09: 0000000000000001 > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff889396b36688 > > > R13: 00000000006000c0 R14: 00000000fffffffe R15: ffff8893a2717a98 > > >FS: 0000000000000000(0000) GS:ffff8893b4ec0000(0000) > > >knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007f896b883940 CR3: 0000000202210003 CR4: 00000000003606e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > >That's 'strlen()' on a bogus pointer. In this case, the pointer in > > >question is in %rdi, and it's 0xdead4ead00000000, which happens to be > > >SPINLOCK_MAGIC in the high bits. > > > > > >In get_kobj_path_length(), we have code like this: > > > > > > if (kobject_name(parent) == NULL) > > > return 0; > > > length += strlen(kobject_name(parent)) + 1; > > > > > >and clearly "kobject_name(parent)" isn't NULL, because it's a wild > > >invalid pointer. > > > > > >Now, "kobject_name()" loads the first word in the kobject as a > > >pointer. So this smells like this is no longer a kobject, but > > >something that has spinlock in the first word (with > > >CONFIG_DEBUG_SPINLOCK, a spinlock first has the 32-bit > > >"arch_spinlock_t", followed by a "unsigned int magic", so that would > > >explain the SPINLOCK_MAGIC in the high bits. > > > > > >(Adding Greg too, just in case it's some kobject issue). > > > > > >Does this look at all familiar? Anything that has changed in this area > > >recently? > > > > > > Hmm, I have not seen this one yet. I see there is HID++ and Logitech drivers involved, let's add Jiri and Benjamin. > > That is news to me too. > Concerning the changes, neither hid-logitech-dj nor hid-logitech-hidpp > have been updated recently. The one important thing that came in > recently in the HID tree was the fact that hid-generic now binds > before the specific drivers, and unbind when the specific driver gets > loaded. This was introduced in v4.16, and fixed in v4.17 and v4.18 to > avoid having a situation where hid expects a driver to be loaded when > it is not compiled in or not available because we are in the initrd. > However, this change doesn't seem to be at the root of the problem > because the stack trace starts with the usb removal of the device, so > hid-generic has long be gone. > > But the more worrying part is that the input-leds code hasn't see any > changes recently too, nor the led class. > So my bet is that you are triggering an old bug no one saw before. OK, I see it, it is busted conversion to struct_size(). The macro expects that the array is an array of pointers, whereas in input leds helper we have an array of real structs, and struct_size() calculates too small size for them. I'd recommend reverting: commit acafe7e30216166a17e6e226aadc3ecb63993242 Author: Kees Cook <keescook@xxxxxxxxxxxx> Date: Tue May 8 13:45:50 2018 -0700 treewide: Use struct_size() for kmalloc()-family as used Coccinelle scripts seem to be too greedy. Thanks. -- Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html