Re: Race between io_wqe_worker() and io_wqe_wake_worker()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2021/8/3 下午10:37, Jens Axboe 写道:
On 8/3/21 7:22 AM, Jens Axboe wrote:
On 8/2/21 7:05 PM, Nadav Amit wrote:
Hello Jens,

I encountered an issue, which appears to be a race between
io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address
this issue and whether I am missing something, since this seems to
occur in a common scenario. Your feedback (or fix ;-)) would be
appreciated.

I run on 5.13 a workload that issues multiple async read operations
that should run concurrently. Some read operations can not complete
for unbounded time (e.g., read from a pipe that is never written to).
The problem is that occasionally another read operation that should
complete gets stuck. My understanding, based on debugging and the code
is that the following race (or similar) occurs:


   cpu0					cpu1
   ----					----
					io_wqe_worker()
					 schedule_timeout()
					 // timed out
   io_wqe_enqueue()
    io_wqe_wake_worker()
     // work_flags & IO_WQ_WORK_CONCURRENT
     io_wqe_activate_free_worker()
					 io_worker_exit()


Basically, io_wqe_wake_worker() can find a worker, but this worker is
about to exit and is not going to process further work. Once the
worker exits, the concurrency level decreases and async work might be
blocked by another work. I had a look at 5.14, but did not see
anything that might address this issue.

Am I missing something?

If not, all my ideas for a solution are either complicated (track
required concurrency-level) or relaxed (span another worker on
io_worker_exit if work_list of unbounded work is not empty).

As said, feedback would be appreciated.

You are right that there's definitely a race here between checking the
freelist and finding a worker, but that worker is already exiting. Let
me mull over this a bit, I'll post something for you to try later today.

Can you try something like this? Just consider it a first tester, need
to spend a bit more time on it to ensure we fully close the gap.


diff --git a/fs/io-wq.c b/fs/io-wq.c
index cf086b01c6c6..e2da2042ee9e 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -42,6 +42,7 @@ struct io_worker {
  	refcount_t ref;
  	unsigned flags;
  	struct hlist_nulls_node nulls_node;
+	unsigned long exiting;
  	struct list_head all_list;
  	struct task_struct *task;
  	struct io_wqe *wqe;
@@ -214,15 +215,20 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe)
  	struct hlist_nulls_node *n;
  	struct io_worker *worker;
- n = rcu_dereference(hlist_nulls_first_rcu(&wqe->free_list));
-	if (is_a_nulls(n))
-		return false;
-
-	worker = hlist_nulls_entry(n, struct io_worker, nulls_node);
-	if (io_worker_get(worker)) {
-		wake_up_process(worker->task);
+	/*
+	 * Iterate free_list and see if we can find an idle worker to
+	 * activate. If a given worker is on the free_list but in the process
+	 * of exiting, keep trying.
+	 */
+	hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) {
+		if (!io_worker_get(worker))
+			continue;
+		if (!test_bit(0, &worker->exiting)) {
+			wake_up_process(worker->task);
+			io_worker_release(worker);
+			return true;
+		}
  		io_worker_release(worker);
-		return true;
  	}
return false;
@@ -560,8 +566,17 @@ static int io_wqe_worker(void *data)
  		if (ret)
  			continue;
  		/* timed out, exit unless we're the fixed worker */
-		if (!(worker->flags & IO_WORKER_F_FIXED))
+		if (!(worker->flags & IO_WORKER_F_FIXED)) {
+			/*
+			 * Someone elevated our refs, which could be trying
+			 * to re-activate for work. Loop one more time for
+			 * that case.
+			 */
+			if (refcount_read(&worker->ref) != 1)
+				continue;
+			set_bit(0, &worker->exiting);
  			break;
+		}
  	}
if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) {


refcount check may not be enough, we may need another bit worker->in_use
and:
    io_wqe_activate_free_worker                io_wqe_worker

     set_bit(worker->in_use)               set_bit(worker->exiting)
     !test_bit(worker->exiting)            test_bit(worker->in_use)
     wake_up(worker)                       goto handle work



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux