Re: Race between io_wqe_worker() and io_wqe_wake_worker()

Nadav Amit <nadav.amit@xxxxxxxxx> · Tue, 3 Aug 2021 11:04:44 -0700

> On Aug 3, 2021, at 7:37 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
> On 8/3/21 7:22 AM, Jens Axboe wrote:
>> On 8/2/21 7:05 PM, Nadav Amit wrote:
>>> Hello Jens,
>>> 
>>> I encountered an issue, which appears to be a race between
>>> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address
>>> this issue and whether I am missing something, since this seems to
>>> occur in a common scenario. Your feedback (or fix ;-)) would be
>>> appreciated.
>>> 
>>> I run on 5.13 a workload that issues multiple async read operations
>>> that should run concurrently. Some read operations can not complete
>>> for unbounded time (e.g., read from a pipe that is never written to).
>>> The problem is that occasionally another read operation that should
>>> complete gets stuck. My understanding, based on debugging and the code
>>> is that the following race (or similar) occurs:
>>> 
>>> 
>>>  cpu0					cpu1
>>>  ----					----
>>> 					io_wqe_worker()
>>> 					 schedule_timeout()
>>> 					 // timed out
>>>  io_wqe_enqueue()
>>>   io_wqe_wake_worker()
>>>    // work_flags & IO_WQ_WORK_CONCURRENT
>>>    io_wqe_activate_free_worker()
>>> 					 io_worker_exit()
>>> 
>>> 
>>> Basically, io_wqe_wake_worker() can find a worker, but this worker is
>>> about to exit and is not going to process further work. Once the
>>> worker exits, the concurrency level decreases and async work might be
>>> blocked by another work. I had a look at 5.14, but did not see
>>> anything that might address this issue.
>>> 
>>> Am I missing something?
>>> 
>>> If not, all my ideas for a solution are either complicated (track
>>> required concurrency-level) or relaxed (span another worker on
>>> io_worker_exit if work_list of unbounded work is not empty).
>>> 
>>> As said, feedback would be appreciated.
>> 
>> You are right that there's definitely a race here between checking the
>> freelist and finding a worker, but that worker is already exiting. Let
>> me mull over this a bit, I'll post something for you to try later today.
> 
> Can you try something like this? Just consider it a first tester, need
> to spend a bit more time on it to ensure we fully close the gap.

Thanks for the quick response.

I tried you version. It works better, but my workload still gets stuck
occasionally (less frequently though). It is pretty obvious that the
version you sent still has a race, so I didn’t put the effort into
debugging it.

I should note that I have an ugly hack that does make my test pass. I
include it, although it is obviously not the right solution.

diff --git a/fs/io-wq.c b/fs/io-wq.c
index b3e8624a37d0..a8792809e416 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -165,6 +165,17 @@ static void io_worker_ref_put(struct io_wq *wq)
                complete(&wq->worker_done);
 }
 
+static void io_queue_worker_create(struct io_wqe *wqe, struct io_wqe_acct *acct);
+
+static inline bool io_wqe_run_queue(struct io_wqe *wqe)
+       __must_hold(wqe->lock)
+{
+       if (!wq_list_empty(&wqe->work_list) &&
+           !(wqe->flags & IO_WQE_FLAG_STALLED))
+               return true;
+       return false;
+}
+
 static void io_worker_exit(struct io_worker *worker)
 {
        struct io_wqe *wqe = worker->wqe;
@@ -192,17 +203,17 @@ static void io_worker_exit(struct io_worker *worker)
        raw_spin_unlock_irq(&wqe->lock);
 
        kfree_rcu(worker, rcu);
+       raw_spin_lock_irq(&wqe->lock);
+
+       if (!(flags & IO_WORKER_F_BOUND) && io_wqe_run_queue(wqe)) {
+               atomic_inc(&acct->nr_running);
+               atomic_inc(&wqe->wq->worker_refs);
+               io_queue_worker_create(wqe, acct);
+       }
        io_worker_ref_put(wqe->wq);
-       do_exit(0);
-}
 
-static inline bool io_wqe_run_queue(struct io_wqe *wqe)
-       __must_hold(wqe->lock)
-{
-       if (!wq_list_empty(&wqe->work_list) &&
-           !(wqe->flags & IO_WQE_FLAG_STALLED))
-               return true;
-       return false;
+       raw_spin_unlock_irq(&wqe->lock);
+       do_exit(0);
 }
 
 /*