> On Aug 3, 2021, at 7:37 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 8/3/21 7:22 AM, Jens Axboe wrote: >> On 8/2/21 7:05 PM, Nadav Amit wrote: >>> Hello Jens, >>> >>> I encountered an issue, which appears to be a race between >>> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address >>> this issue and whether I am missing something, since this seems to >>> occur in a common scenario. Your feedback (or fix ;-)) would be >>> appreciated. >>> >>> I run on 5.13 a workload that issues multiple async read operations >>> that should run concurrently. Some read operations can not complete >>> for unbounded time (e.g., read from a pipe that is never written to). >>> The problem is that occasionally another read operation that should >>> complete gets stuck. My understanding, based on debugging and the code >>> is that the following race (or similar) occurs: >>> >>> >>> cpu0 cpu1 >>> ---- ---- >>> io_wqe_worker() >>> schedule_timeout() >>> // timed out >>> io_wqe_enqueue() >>> io_wqe_wake_worker() >>> // work_flags & IO_WQ_WORK_CONCURRENT >>> io_wqe_activate_free_worker() >>> io_worker_exit() >>> >>> >>> Basically, io_wqe_wake_worker() can find a worker, but this worker is >>> about to exit and is not going to process further work. Once the >>> worker exits, the concurrency level decreases and async work might be >>> blocked by another work. I had a look at 5.14, but did not see >>> anything that might address this issue. >>> >>> Am I missing something? >>> >>> If not, all my ideas for a solution are either complicated (track >>> required concurrency-level) or relaxed (span another worker on >>> io_worker_exit if work_list of unbounded work is not empty). >>> >>> As said, feedback would be appreciated. >> >> You are right that there's definitely a race here between checking the >> freelist and finding a worker, but that worker is already exiting. Let >> me mull over this a bit, I'll post something for you to try later today. > > Can you try something like this? Just consider it a first tester, need > to spend a bit more time on it to ensure we fully close the gap. Thanks for the quick response. I tried you version. It works better, but my workload still gets stuck occasionally (less frequently though). It is pretty obvious that the version you sent still has a race, so I didn’t put the effort into debugging it. I should note that I have an ugly hack that does make my test pass. I include it, although it is obviously not the right solution. diff --git a/fs/io-wq.c b/fs/io-wq.c index b3e8624a37d0..a8792809e416 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -165,6 +165,17 @@ static void io_worker_ref_put(struct io_wq *wq) complete(&wq->worker_done); } +static void io_queue_worker_create(struct io_wqe *wqe, struct io_wqe_acct *acct); + +static inline bool io_wqe_run_queue(struct io_wqe *wqe) + __must_hold(wqe->lock) +{ + if (!wq_list_empty(&wqe->work_list) && + !(wqe->flags & IO_WQE_FLAG_STALLED)) + return true; + return false; +} + static void io_worker_exit(struct io_worker *worker) { struct io_wqe *wqe = worker->wqe; @@ -192,17 +203,17 @@ static void io_worker_exit(struct io_worker *worker) raw_spin_unlock_irq(&wqe->lock); kfree_rcu(worker, rcu); + raw_spin_lock_irq(&wqe->lock); + + if (!(flags & IO_WORKER_F_BOUND) && io_wqe_run_queue(wqe)) { + atomic_inc(&acct->nr_running); + atomic_inc(&wqe->wq->worker_refs); + io_queue_worker_create(wqe, acct); + } io_worker_ref_put(wqe->wq); - do_exit(0); -} -static inline bool io_wqe_run_queue(struct io_wqe *wqe) - __must_hold(wqe->lock) -{ - if (!wq_list_empty(&wqe->work_list) && - !(wqe->flags & IO_WQE_FLAG_STALLED)) - return true; - return false; + raw_spin_unlock_irq(&wqe->lock); + do_exit(0); } /*