This is a note to let you know that I've just added the patch titled io-wq: fix race between worker exiting and activating free worker to the 5.13-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: io-wq-fix-race-between-worker-exiting-and-activating-free-worker.patch and it can be found in the queue-5.13 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 83d6c39310b6d11199179f6384c2b0a415389597 Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe@xxxxxxxxx> Date: Tue, 3 Aug 2021 09:14:35 -0600 Subject: io-wq: fix race between worker exiting and activating free worker From: Jens Axboe <axboe@xxxxxxxxx> commit 83d6c39310b6d11199179f6384c2b0a415389597 upstream. Nadav correctly reports that we have a race between a worker exiting, and new work being queued. This can lead to work being queued behind an existing worker that could be sleeping on an event before it can run to completion, and hence introducing potential big latency gaps if we hit this race condition: cpu0 cpu1 ---- ---- io_wqe_worker() schedule_timeout() // timed out io_wqe_enqueue() io_wqe_wake_worker() // work_flags & IO_WQ_WORK_CONCURRENT io_wqe_activate_free_worker() io_worker_exit() Fix this by having the exiting worker go through the normal decrement of a running worker, which will spawn a new one if needed. The free worker activation is modified to only return success if we were able to find a sleeping worker - if not, we keep looking through the list. If we fail, we create a new worker as per usual. Cc: stable@xxxxxxxxxxxxxxx Link: https://lore.kernel.org/io-uring/BFF746C0-FEDE-4646-A253-3021C57C26C9@xxxxxxxxx/ Reported-by: Nadav Amit <nadav.amit@xxxxxxxxx> Tested-by: Nadav Amit <nadav.amit@xxxxxxxxx> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- fs/io-wq.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -131,6 +131,7 @@ struct io_cb_cancel_data { }; static void create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index); +static void io_wqe_dec_running(struct io_worker *worker); static bool io_worker_get(struct io_worker *worker) { @@ -169,26 +170,21 @@ static void io_worker_exit(struct io_wor { struct io_wqe *wqe = worker->wqe; struct io_wqe_acct *acct = io_wqe_get_acct(worker); - unsigned flags; if (refcount_dec_and_test(&worker->ref)) complete(&worker->ref_done); wait_for_completion(&worker->ref_done); - preempt_disable(); - current->flags &= ~PF_IO_WORKER; - flags = worker->flags; - worker->flags = 0; - if (flags & IO_WORKER_F_RUNNING) - atomic_dec(&acct->nr_running); - worker->flags = 0; - preempt_enable(); - raw_spin_lock_irq(&wqe->lock); - if (flags & IO_WORKER_F_FREE) + if (worker->flags & IO_WORKER_F_FREE) hlist_nulls_del_rcu(&worker->nulls_node); list_del_rcu(&worker->all_list); acct->nr_workers--; + preempt_disable(); + io_wqe_dec_running(worker); + worker->flags = 0; + current->flags &= ~PF_IO_WORKER; + preempt_enable(); raw_spin_unlock_irq(&wqe->lock); kfree_rcu(worker, rcu); @@ -215,15 +211,19 @@ static bool io_wqe_activate_free_worker( struct hlist_nulls_node *n; struct io_worker *worker; - n = rcu_dereference(hlist_nulls_first_rcu(&wqe->free_list)); - if (is_a_nulls(n)) - return false; - - worker = hlist_nulls_entry(n, struct io_worker, nulls_node); - if (io_worker_get(worker)) { - wake_up_process(worker->task); + /* + * Iterate free_list and see if we can find an idle worker to + * activate. If a given worker is on the free_list but in the process + * of exiting, keep trying. + */ + hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) { + if (!io_worker_get(worker)) + continue; + if (wake_up_process(worker->task)) { + io_worker_release(worker); + return true; + } io_worker_release(worker); - return true; } return false; Patches currently in stable-queue which might be from axboe@xxxxxxxxx are queue-5.13/blk-iolatency-error-out-if-blk_get_queue-failed-in-i.patch queue-5.13/io-wq-fix-lack-of-acct-nr_workers-acct-max_workers-j.patch queue-5.13/io-wq-fix-race-between-worker-exiting-and-activating-free-worker.patch queue-5.13/s390-dasd-fix-use-after-free-in-dasd-path-handling.patch queue-5.13/io-wq-fix-no-lock-protection-of-acct-nr_worker.patch