Hi Linus, On 27-10-18 18:08, Linus Torvalds wrote:
Julian, Jiri, On my laptop I'm getting a kernel page fault with the current git tree, and I'm tentatively blaming commit 9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices") but that's simply because it's the only thing that seems to touch this particular area in this merge window. The oops looks like this: BUG: unable to handle kernel paging request at 000000007a25d598 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 888 Comm: systemd-udevd Not tainted 4.19.0-07715-g345671ea0f92 #4 Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.7.0 01/16/2018 RIP: 0010:strstr+0x19/0x70 where the code disassembly (and the register contents) shows that the wild pointer is the first argument to "strstr()", which just has a bogus value that is not a valid kernel pointer (RDI: 000000007a25d598 - which is obviously also the address of the page fault) The call trace is: dmi_matches+0x55/0xc0 dmi_first_match+0x26/0x40 i2c_hid_get_dmi_i2c_hid_desc_override+0x16/0x40 [i2c_hid] i2c_hid_probe+0x28c/0x760 [i2c_hid] i2c_device_probe+0x1e7/0x260 really_probe+0xf8/0x3e0 driver_probe_device+0x10f/0x120 bus_for_each_drv+0x66/0xb0 __device_attach+0xd9/0x150 bus_probe_device+0x8a/0xa0 device_add+0x48e/0x660 i2c_new_device+0x162/0x350 which is why I suspect that new i2c_hid_get_dmi_hid_report_desc_override() code. I *think* the problem is that the i2c_hid_dmi_desc_override_table[] isn't terminated by a NULL entry, and I will test that next.
Yes that likely is the problem. I already had a bug report from one of the Manjaro maintainers who was cherry picking this into the Manjaro kernel. So I ran some tests on a laptop of mine which does use i2c-hid but I failed to reproduce the issue, so we both (me and the Manjaro maintainer) both assumed something went wrong with the backport. Both of us seem to have overlooked the missing terminating entry, as well as other people involved in the patch.
What makes me *very* unhappy about this is that if I'm right, I think it means that code was literally not tested at all by anybody who didn't have one of the entries in that list.
That is not true, I've hit one of these unterminated dmi lists issues before and it depends on what get put in mem directly after the list by the linker, bugs caused by this do not always reproduce unfortunately. And as mentioned I have tested the patch on a machine with an i2c-hid touchpad, which is not on the list and I did not hit this problem. Anyways this is fixed now, thank you for catching this. Regards, Hans