Hi Tetsuo, On Fri, Sep 2, 2022 at 11:49 PM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > On 2022/09/03 6:31, Luiz Augusto von Dentz wrote: > > Hi Tetsuo, > > > > On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote: > >> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing > >> after the cancel pattern? > > HCI_CMD_DRAIN_WORKQUEUE does not help for this case. > > What extid=243b7d89777f90f7613b is reporting is > > hci_cmd_timeout() { hci_dev_do_reset() { > starts sleeping due to e.g. preemption > hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); // Sets HCI_CMD_DRAIN_WORKQUEUE flag > cancel_delayed_work(&hdev->cmd_timer); // does nothing because hci_cmd_timeout() is already running > cancel_delayed_work(&hdev->ncmd_timer); > drain_workqueue(hdev->workqueue) { > sets __WQ_DRAINING flag on hdev->workqueue > starts waiting for completion of all works on hdev->workqueue > finishes sleeping due to e.g. preemption > queue_work(hdev->workqueue, &hdev->cmd_work) // <= complains attempt to queue work from system_wq into __WQ_DRAINING hdev->workqueue And we can check for __WQ_DRAINING? Anyway checking HCI_CMD_DRAIN_WORKQUEUE seems useless so we either have to check if queue_work can be used or not. > } > finishes waiting for completion of all works on hdev->workqueue > clears __WQ_DRAINING flag > } > } > > race condition. Notice that cancel_delayed_work() does not wait for > completion of already started hci_cmd_timeout() callback. > > If you need to wait for completion of already started callback, > you need to use _sync version (e.g. cancel_delayed_work_sync()). > And watch out for locking dependency when using _sync version. > > >> I wonder if wouldn't be better to introduce > >> some function that disables/enables the workqueue so we don't have to > >> do extra tracking in the driver/subsystem? > >> > > > > I was thinking on doing something like the following: > > > > https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16 > > That patch does not close race, for > > @@ -4037,6 +4038,10 @@ static void hci_cmd_work(struct work_struct *work) > BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name, > atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q)); > > + /* Don't queue while draining */ > + if (hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) > + return; > /* > * BUG: WE ARE FREE TO SLEEP FOR ARBITRARY DURATION IMMEDIATELY AFTER CHECKING THE FLAG. > * ANY "TEST AND DO SOMETHING" NEEDS TO BE PROTECTED BY A LOCK MECHANISM. > */ Then we need a lock not a flag. > /* Send queued commands */ > if (atomic_read(&hdev->cmd_cnt)) { > skb = skb_dequeue(&hdev->cmd_q); > > . In other words, HCI_CMD_DRAIN_WORKQUEUE does not fix what extid=63bed493aebbf6872647 is reporting. > > If "TEST AND DO SOMETHING" does not sleep, RCU is a handy lock mechanism. > > > > > Since there is no reason to queue any command if we are draining and > > are gonna reset at the end it is pretty useless to queue commands at > > that point. > > Then, you can add that check. > -- Luiz Augusto von Dentz