On Jun 05 2023, Mark Lord wrote: > > On 2023-06-05 10:27 AM, Benjamin Tissoires wrote: > > > > On Jun 05 2023, Linux regression tracking (Thorsten Leemhuis) wrote: > >> > >> On 03.06.23 14:41, Jiri Kosina wrote: > >>> On Wed, 31 May 2023, Jiri Kosina wrote: > >>> > >>>>> If an attempt at contacting a receiver or a device fails because the > >>>>> receiver or device never responds, don't restart the communication, only > >>>>> restart it if the receiver or device answers that it's busy, as originally > >>>>> intended. > >>>>> > >>>>> This was the behaviour on communication timeout before commit 586e8fede795 > >>>>> ("HID: logitech-hidpp: Retry commands when device is busy"). > >>>>> > >>>>> This fixes some overly long waits in a critical path on boot, when > >>>>> checking whether the device is connected by getting its HID++ version. > >>>>> > >>>>> Signed-off-by: Bastien Nocera <hadess@xxxxxxxxxx> > >>>>> Suggested-by: Mark Lord <mlord@xxxxxxxxx> > >>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy") > >>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217412 > >>> [...] > >>>> > >>>> I have applied this even before getting confirmation from the reporters in > >>>> bugzilla, as it's the right thing to do anyway. > >>> > >>> Unfortunately it doesn't seem to cure the reported issue (while reverting > >>> 586e8fede79 does): > >> > >> BTW, remind me again: was fixing this by reverting 586e8fede79 for now a > >> option? I guess it's not, but if I'm wrong I wonder if that might at > >> this point be the best way forward. > > > > Could be. I don't think we thought at simply reverting it because it is > > required for some new supoprted devices because they might differ > > slightly from what we currently supported. > > > > That being said, Bastien will be unavailable for at least a week AFAIU, > > so maybe we should revert that patch. > > > >> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=217523#c2 > > Pardon me, but I don't understand what that "retry" loop > is even attempting to do inside function hidpp_send_message_sync(). > > It appears to unconditionally loop until: > (1) the __hidpp_send_report() fails, > or (2) it gets a HIDPP_ERROR, > or (3) it gets a HIDPP20_ERROR other than HIDPP20_ERROR_BUSY, > or (4) until it has looped 3 times, which appears to be the normal exit. > > It doesn't seem to have any provision to exit the loop earlier on "success" > (whatever that is). > > And so when it finally does exit after the 3 iterations, > it then returns the last value of "ret", > which will be zero from the __hidpp_send_report() call, > or sometimes the most recent non-BUSY HIDPP20_ERROR seen. > > Obviously I'm missing something, as otherwise this code would never have > passed review and made it into the Linux kernel in the first place. Right? Ouch, very much ouch :( So that one is on me, sorry, I completely missed that and trusted the local tests. We are human and can make mistakes. And TBH a lot of people have been staring at this code for a while now without noticing that. Would you mind giving a shot at the following (untested) patch: --- >From 7d03a6b765571d7c518c85e7e74f6f9d498fa354 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> Date: Mon, 5 Jun 2023 18:56:36 +0200 Subject: [PATCH] HID: hidpp: terminate retry loop on success It seems we forgot the normal case to terminate the retry loop, making us asking 3 times each command, which is probably a little bit too much. And remove the ugly "goto exit" that can be replaced by a simpler "break" Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy") Suggested-by: Mark Lord <mlord@xxxxxxxxx> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> --- drivers/hid/hid-logitech-hidpp.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c index 2246044b1639..5e1a412fd28f 100644 --- a/drivers/hid/hid-logitech-hidpp.c +++ b/drivers/hid/hid-logitech-hidpp.c @@ -286,7 +286,7 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp, struct hidpp_report *message, struct hidpp_report *response) { - int ret; + int ret = -1; int max_retries = 3; mutex_lock(&hidpp->send_mutex); @@ -300,13 +300,13 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp, */ *response = *message; - for (; max_retries != 0; max_retries--) { + for (; max_retries != 0 && ret; max_retries--) { ret = __hidpp_send_report(hidpp->hid_dev, message); if (ret) { dbg_hid("__hidpp_send_report returned err: %d\n", ret); memset(response, 0, sizeof(struct hidpp_report)); - goto exit; + break; } if (!wait_event_timeout(hidpp->wait, hidpp->answer_available, @@ -314,14 +314,14 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp, dbg_hid("%s:timeout waiting for response\n", __func__); memset(response, 0, sizeof(struct hidpp_report)); ret = -ETIMEDOUT; - goto exit; + break; } if (response->report_id == REPORT_ID_HIDPP_SHORT && response->rap.sub_id == HIDPP_ERROR) { ret = response->rap.params[1]; dbg_hid("%s:got hidpp error %02X\n", __func__, ret); - goto exit; + break; } if ((response->report_id == REPORT_ID_HIDPP_LONG || @@ -330,13 +330,12 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp, ret = response->fap.params[1]; if (ret != HIDPP20_ERROR_BUSY) { dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret); - goto exit; + break; } dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret); } } -exit: mutex_unlock(&hidpp->send_mutex); return ret; -- 2.40.1 --- > > What is this code trying to do? And what am I not seeing? It was supposed to retry on specific errors only, but it was retrying on success too... Cheers, Benjamin