Re: [PATCH v2] io-wq: fix race between worker exiting and activating free worker

Nadav Amit <nadav.amit@xxxxxxxxx> · Wed, 4 Aug 2021 13:28:31 -0700

> On Aug 4, 2021, at 1:00 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
> Nadav correctly reports that we have a race between a worker exiting,
> and new work being queued. This can lead to work being queued behind
> an existing worker that could be sleeping on an event before it can
> run to completion, and hence introducing potential big latency gaps
> if we hit this race condition:
> 
> cpu0                                    cpu1
> ----                                    ----
>                                        io_wqe_worker()
>                                        schedule_timeout()
>                                         // timed out
> io_wqe_enqueue()
> io_wqe_wake_worker()
> // work_flags & IO_WQ_WORK_CONCURRENT
> io_wqe_activate_free_worker()
>                                         io_worker_exit()
> 
> Fix this by having the exiting worker go through the normal decrement
> of a running worker, which will spawn a new one if needed.
> 
> The free worker activation is modified to only return success if we
> were able to find a sleeping worker - if not, we keep looking through
> the list. If we fail, we create a new worker as per usual.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Link: https://lore.kernel.org/io-uring/BFF746C0-FEDE-4646-A253-3021C57C26C9@xxxxxxxxx/
> Reported-by: Nadav Amit <nadav.amit@xxxxxxxxx>
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

Tested-by: Nadav Amit <nadav.amit@xxxxxxxxx>

Thanks!