On Tue, May 23, 2023 at 2:31 PM Jiri Kosina <jikos@xxxxxxxxxx> wrote: > > On Mon, 22 May 2023, Linus Torvalds wrote: > > > > FWIW, in case anybody is interested in a status update: one reporter > > > bisected the problem down to 586e8fede79 ("HID: logitech-hidpp: Retry > > > commands when device is busy"); reverting that commit on-top of 6.3 > > > fixes the problem for that reporter. For that reporter things also work > > > on 6.4-rc; but for someone else that is affected that's not the case. > > FWIW, I was pretty much away for past few weeks as well, same as Benjamin > as Bastien. Which is unfortunate timing, but that's how things pan out > sometimes. > > > Hmm. It's likely timing-dependent. > > > > But that code is clearly buggy. > > > > If the wait_event_timeout() returns early, the device hasn't replied, > > but the code does > > > > if (!wait_event_timeout(hidpp->wait, hidpp->answer_available, > > 5*HZ)) { > > dbg_hid("%s:timeout waiting for response\n", __func__); > > memset(response, 0, sizeof(struct hidpp_report)); > > ret = -ETIMEDOUT; > > } > > > > and then continues to look at the response _anyway_. > > Yeah; we are zeroing it out, but that doesn't really make things any > better in principle, given all the dereferences later. > > The issue seems to be existing ever since 2f31c52529 ("HID: Introduce > hidpp, a module to handle Logitech hid++ devices") when this whole driver > was introduced, as far as I can tell. Yep, that was on me. But the weird part is that I should be able to reproduce this locally then, but I don't. > > > Now, depending on out hardening options, that response may have been > > initialized by the compiler, or may just be random stack contents. > > Again, as in case of timeout the buffer is just zeroed out, I'd just much > more expect NULL pointer dereference in such case. Which is not what we > are seeing here. Returning -ETIMEDOUT seems good to me FWIW. > > > That bug is pre-existing (ie the problem was not introduced by that > > commit), but who knows if the retry makes things worse (ie if it then > > triggers on a retry, the response data will be the *previous* response). > > > > The whole "goto exit" games should be removed too, because we're in a > > for-loop, and instead of "goto exit" it should just do "break". > > > > IOW, something like this might be worth testing. > > > > That said, while I think the code is buggy, I doubt this is the actual > > cause of the problem people are reporting. But it would be lovely to > > hear if the attached patch makes any difference, and I think this is > > fixing a real - but unlikely - problem anyway. FWIW, Linus, your patch is Reviewed-by: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> Feel free to submit it to us or to apply it directly if you prefer as this is clearly a fix for a code path issue. I am barely struggling with everything now that I am back from last week, being sick at the beginning of the week and still not feeling completely well doesn't help. Cheers, Benjamin > > > > And obviously it might be helpful to actually enable those dbg_hid() > > messages, but I didn't look at what the magic config option to do so > > was. > > dbg_hid() is just pr_debug(), which means that on kernels with > CONFIG_DYNAMIC_DEBUG, this makes use of the dynamic debug facility; > otherwsie it just becomes printk(KERN_DEBUG...). > > Thanks, > > -- > Jiri Kosina > SUSE Labs >