Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works

Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> · Sat, 3 Sep 2022 19:11:18 -0700

Hi Tetsuo,

On Fri, Sep 2, 2022 at 11:49 PM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2022/09/03 6:31, Luiz Augusto von Dentz wrote:
> > Hi Tetsuo,
> >
> > On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz <luiz.dentz@xxxxxxxxx> wrote:
> >> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
> >> after the cancel pattern?
>
> HCI_CMD_DRAIN_WORKQUEUE does not help for this case.
>
> What extid=243b7d89777f90f7613b is reporting is
>
>   hci_cmd_timeout() {                             hci_dev_do_reset() {
>     starts sleeping due to e.g. preemption
>                                                     hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); // Sets HCI_CMD_DRAIN_WORKQUEUE flag
>                                                     cancel_delayed_work(&hdev->cmd_timer); // does nothing because hci_cmd_timeout() is already running
>                                                     cancel_delayed_work(&hdev->ncmd_timer);
>                                                     drain_workqueue(hdev->workqueue) {
>                                                       sets __WQ_DRAINING flag on hdev->workqueue
>                                                       starts waiting for completion of all works on hdev->workqueue
>     finishes sleeping due to e.g. preemption
>     queue_work(hdev->workqueue,  &hdev->cmd_work) // <= complains attempt to queue work from system_wq into __WQ_DRAINING hdev->workqueue

And we can check for __WQ_DRAINING? Anyway checking
HCI_CMD_DRAIN_WORKQUEUE seems useless so we either have to check if
queue_work can be used or not.

>   }
>                                                       finishes waiting for completion of all works on hdev->workqueue
>                                                       clears __WQ_DRAINING flag
>                                                     }
>                                                   }
>
> race condition. Notice that cancel_delayed_work() does not wait for
> completion of already started hci_cmd_timeout() callback.
>
> If you need to wait for completion of already started callback,
> you need to use _sync version (e.g. cancel_delayed_work_sync()).
> And watch out for locking dependency when using _sync version.
>
> >>                           I wonder if wouldn't be better to introduce
> >> some function that disables/enables the workqueue so we don't have to
> >> do extra tracking in the driver/subsystem?
> >>
> >
> > I was thinking on doing something like the following:
> >
> > https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16
>
> That patch does not close race, for
>
> @@ -4037,6 +4038,10 @@ static void hci_cmd_work(struct work_struct *work)
>         BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
>                atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>
> +       /* Don't queue while draining */
> +       if (hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> +               return;
>         /*
>          * BUG: WE ARE FREE TO SLEEP FOR ARBITRARY DURATION IMMEDIATELY AFTER CHECKING THE FLAG.
>          * ANY "TEST AND DO SOMETHING" NEEDS TO BE PROTECTED BY A LOCK MECHANISM.
>          */

Then we need a lock not a flag.

>         /* Send queued commands */
>         if (atomic_read(&hdev->cmd_cnt)) {
>                 skb = skb_dequeue(&hdev->cmd_q);
>
> . In other words, HCI_CMD_DRAIN_WORKQUEUE does not fix what extid=63bed493aebbf6872647 is reporting.
>
> If "TEST AND DO SOMETHING" does not sleep, RCU is a handy lock mechanism.
>
> >
> > Since there is no reason to queue any command if we are draining and
> > are gonna reset at the end it is pretty useless to queue commands at
> > that point.
>
> Then, you can add that check.
>

-- 
Luiz Augusto von Dentz