Hi Benjamin, On 10/6/23 19:24, Hans de Goede wrote: > Hi, > > On 10/6/23 18:28, Benjamin Tissoires wrote: >> On Oct 06 2023, Hans de Goede wrote: > > <snip> > >>>>> @@ -4207,36 +4208,39 @@ static void hidpp_connect_event(struct hidpp_device *hidpp) >>>>> return; >>>>> } >>>>> >>>>> + /* Avoid probe() restarting IO */ >>>>> + mutex_lock(&hidpp->io_mutex); >>>> >>>> I'd put a `__must_hold(&hidpp->io_mutex);` here, not changing any return >>>> path and forcing any caller to `hidpp_connect_event()` (which will >>>> eventually only be the work struct) to take the lock. >>>> >>>> This should simplify the patch by a lot and also ensure someone doesn't >>>> forget the `goto out_unlock`. >>> >>> Ok, I can add the __must_hold() here and make >>> delayed_Work_cb take the lock, but that would make it >>> impossible to implement patch 2/2 in a clean manner and >>> I do like patch 2/2 since it makes it clear that >>> hidpp_connect_event must only run from the workqueue >>> but I guess we could just add a comment for that >>> instead. >> >> In 2/2, just rename this function to __do_hidpp_connect_event(), and >> have hidpp_connect_event() being the worker, which takes the lock, and >> calls __do_hidpp_connect_event(). > > Ok, will do for v2. > > <snip> > >>>>> @@ -4519,6 +4526,9 @@ static int hidpp_probe(struct hid_device *hdev, const struct hid_device_id *id) >>>>> flush_work(&hidpp->work); >>>>> >>>>> if (will_restart) { >>>>> + /* Avoid hidpp_connect_event() running while restarting */ >>>>> + mutex_lock(&hidpp->io_mutex); >>>>> + >>>>> /* Reset the HID node state */ >>>>> hid_device_io_stop(hdev); >>>> >>>> That's the part that makes me raise an eyebrow. Because we lock, then >>>> release the semaphore to get it back later. Can this induce a dead lock? >>>> >>>> Can't we solve that same scenario without a mutex, but forcing either >>>> the workqueue to not run or to be finished at this point? >>> >>> I'm not sure what you are worried about after the mutex_lock >>> the line above we are 100% guaranteed that hidpp_connect_event() >>> is not running and since it is not running it will also not >>> be holding any other locks, so it can not cause any problems. >> >> Agree, but my point is that you are not entirely solving the issue: >> if now, between hid_device_io_stop() and hid_hw_close() we receive a >> connect notification from the device, hid_input_report() will return >> -EBUSY, and we will lose it (it will not be stacked in the workqueue). >> >> I was thinking at adding a flush_work(&hidpp->work) here, instead of >> the mutex solution, but yours ensures that any connect event already >> started will be handled properly, which is a plus. >> >> Still if between the mutex lock here we receive a connect event from the >> device, we still get -EBUSY at the hid-core layer, and so we will lose >> it. Maybe that's OK because we might re-ask for the device later (I >> don't remember exactly the code), but my point is that because we add a >> mutex doesn't mean we will solve all multi-thread problems. So finding a >> non-mutex solution sometimes is better :) >> >> And the fact that we need to think through every preemption case often >> means that there is something wrong *elsewhere*. > > Right, I did consider seeing if we can get rid of the restart > altogether, as the whole restarting thing is actually the problem > here. AFAICT this is only really necessary in the WTP path since > there where we need to know resolution before instantiating the > input device. > > But atm this is also done for all unifying devices, which seems > unnecessary. > > Buy we still need the restart anyways for the WTP case, > so we need to make it work reliable anyways. > > Now that I understand your concern about the missed connect > packet, which I agree is a real concern I think I know how to > fix this. I'll prepare a new version of this series tomorrow. > > Hmm, thinking more about this, if we normally just create > the input device right away even for unifying devices and > we already always delay the creation for WTP even during > the restart: > > if (hidpp->quirks & HIDPP_QUIRK_DELAYED_INIT) > connect_mask &= ~HID_CONNECT_HIDINPUT; > > Then I wonder why do we even bother to do the restart > thing for unifying devices. Do you know what this is based on ? > > I guess this might have to do with ensuring the configure > commands are send before creating the input + hidraw > devices, but if the connect event comes later on then > the configuration is already done later on after > the input device has already been created ? > > So maybe we should indeed just remove the whole restart > thing entirely and also always rely on hidpp_connect_event > to send the configuration commands, because currently > those are send twice if the device is already connnected > at probe() time. The whole restart thing seems to have been added by you in: 91cf9a98ae4127 ("HID: logitech-hidpp: make .probe usbhid capable") The commit message says that not starting all the HID subdrivers right from the start is necessary because: "It means that any configuration retrieved after the initial hid_hw_start would not be exposed to user space without races." I believe this refers to the filling of hdev->name and hdev->uniq by hidpp_unifying_init() (or other similar code paths for non unifying). It seems that has been broken though by: 498ba20690357 ("HID: logitech-hidpp: Don't restart communication if not necessary") Which causes input-dev registration to happen before hidpp_unifying_init(). TL;DR: 1. There seems to be a good reason for the restart dance so we need to fix it rather then remove it. 2. 498ba20690357 ("HID: logitech-hidpp: Don't restart communication if not necessary") (from Bastien) re-introduces the race which the restart tries to fix and should be reverted. Regards, Hans