Re: [PATCH v2] io_uring: use TWA_SIGNAL for task_work if the task isn't running

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/7/20 3:50 PM, Jens Axboe wrote:
> On 8/7/20 12:00 PM, Jann Horn wrote:
>> On Fri, Aug 7, 2020 at 6:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>
>>> An earlier commit:
>>>
>>> b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")
>>>
>>> ensured that we didn't get stuck waiting for eventfd reads when it's
>>> registered with the io_uring ring for event notification, but we still
>>> have a gap where the task can be waiting on other events in the kernel
>>> and need a bigger nudge to make forward progress.
>>>
>>> Ensure that we use signaled notifications for a task that isn't currently
>>> running, to be certain the work is seen and processed immediately.
>>>
>>> Cc: stable@xxxxxxxxxxxxxxx # v5.7+
>>> Reported-by: Josef <josef.grieb@xxxxxxxxx>
>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>
>>> ---
>>>
>>> This isn't perfect, as it'll use TWA_SIGNAL even for cases where we
>>> don't absolutely need it (like task waiting for completions in
>>> io_cqring_wait()), but we don't have a good way to tell right now. We
>>> can probably improve on this in the future, for now I think this is the
>>> best solution.
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index e9b27cdaa735..b4300a61f231 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1720,7 +1720,7 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
>>>          */
>>>         if (ctx->flags & IORING_SETUP_SQPOLL)
>>>                 notify = 0;
>>> -       else if (ctx->cq_ev_fd)
>>> +       else if (ctx->cq_ev_fd || (tsk->state != TASK_RUNNING))
>>>                 notify = TWA_SIGNAL;
>>>
>>>         ret = task_work_add(tsk, cb, notify);
>>
>> I don't get it. Apart from still not understanding the big picture:
>>
>> What guarantees that the lockless read of tsk->state is in any way
>> related to the state of the remote process by the time we reach
>> task_work_add()? And why do we not need to signal in TASK_RUNNING
>> state (e.g. directly before the remote process switches to
>> TASK_INTERRUPTIBLE or something like that)?
> 
> Yeah it doesn't, the patch doesn't cover the racy case. As far as I can
> tell, we've got two ways to do it:
> 
> 1) We split the task_work_add() into two parts, one adding the work and
>    one doing the signaling. Then we could do:
> 
> int notify = TWA_RESUME;
> 
> __task_work_add(tsk, cb);
> 
> if (ctx->flags & IORING_SETUP_SQPOLL)
> 	notify = 0;
> else if (ctx->cq_ev_fd || (tsk->state != TASK_RUNNING))
> 	notify = TWA_SIGNAL;
> 
> __task_work_signal(tsk, notify);

Something like the attached - totally untested so far, but it implements
that idea.

-- 
Jens Axboe

>From ec67af3b1e0c9325a04d5d1c12311086349d3a79 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@xxxxxxxxx>
Date: Thu, 6 Aug 2020 19:41:50 -0600
Subject: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't
 running

An earlier commit:

b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")

ensured that we didn't get stuck waiting for eventfd reads when it's
registered with the io_uring ring for event notification, but we still
have a gap where the task can be waiting on other events in the kernel
and need a bigger nudge to make forward progress.

Ensure that we use signaled notifications for a task that isn't currently
running, to be certain the work is seen and processed immediately.

Cc: stable@xxxxxxxxxxxxxxx # v5.7+
Reported-by: Josef <josef.grieb@xxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 fs/io_uring.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e9b27cdaa735..443eecdfeda9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1712,21 +1712,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret, notify = TWA_RESUME;
 
+	ret = __task_work_add(tsk, cb);
+	if (unlikely(ret))
+		return ret;
+
 	/*
 	 * SQPOLL kernel thread doesn't need notification, just a wakeup.
-	 * If we're not using an eventfd, then TWA_RESUME is always fine,
-	 * as we won't have dependencies between request completions for
-	 * other kernel wait conditions.
+	 * For any other work, use signaled wakeups if the task isn't
+	 * running to avoid dependencies between tasks or threads. If
+	 * the issuing task is currently waiting in the kernel on a thread,
+	 * and same thread is waiting for a completion event, then we need
+	 * to ensure that the issuing task processes task_work. TWA_SIGNAL
+	 * is needed for that.
 	 */
 	if (ctx->flags & IORING_SETUP_SQPOLL)
 		notify = 0;
-	else if (ctx->cq_ev_fd)
+	else if (READ_ONCE(tsk->state) != TASK_RUNNING)
 		notify = TWA_SIGNAL;
 
-	ret = task_work_add(tsk, cb, notify);
-	if (!ret)
-		wake_up_process(tsk);
-	return ret;
+	__task_work_notify(tsk, notify);
+	wake_up_process(tsk);
+	return 0;
 }
 
 static void __io_req_task_cancel(struct io_kiocb *req, int error)
-- 
2.28.0

>From 802b09d10bdd2f90e510049b1c52b81edc2ae0a3 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@xxxxxxxxx>
Date: Fri, 7 Aug 2020 16:00:53 -0600
Subject: [PATCH 1/2] kernel: split task_work_add() into two separate helpers

Some callers may need to make signaling decisions based on the state
of the targeted task, and that can only safely be done post adding
the task_work to the task. Split task_work_add() into:

__task_work_add()	- adds the work item
__task_work_notify()	- sends the notification

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 include/linux/task_work.h | 19 ++++++++++++++++
 kernel/task_work.c        | 48 +++++++++++++++++++++------------------
 2 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 0fb93aafa478..7abbd8df5e13 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -5,6 +5,8 @@
 #include <linux/list.h>
 #include <linux/sched.h>
 
+extern struct callback_head work_exited;
+
 typedef void (*task_work_func_t)(struct callback_head *);
 
 static inline void
@@ -13,6 +15,21 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 	twork->func = func;
 }
 
+static inline int __task_work_add(struct task_struct *task,
+				  struct callback_head *work)
+{
+	struct callback_head *head;
+
+	do {
+		head = READ_ONCE(task->task_works);
+		if (unlikely(head == &work_exited))
+			return -ESRCH;
+		work->next = head;
+	} while (cmpxchg(&task->task_works, head, work) != head);
+
+	return 0;
+}
+
 #define TWA_RESUME	1
 #define TWA_SIGNAL	2
 int task_work_add(struct task_struct *task, struct callback_head *twork, int);
@@ -20,6 +37,8 @@ int task_work_add(struct task_struct *task, struct callback_head *twork, int);
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
 void task_work_run(void);
 
+void __task_work_notify(struct task_struct *task, int notify);
+
 static inline void exit_task_work(struct task_struct *task)
 {
 	task_work_run();
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 5c0848ca1287..9bde81481984 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -3,7 +3,27 @@
 #include <linux/task_work.h>
 #include <linux/tracehook.h>
 
-static struct callback_head work_exited; /* all we need is ->next == NULL */
+struct callback_head work_exited = {
+	.next = NULL	/* all we need is ->next == NULL */
+};
+
+void __task_work_notify(struct task_struct *task, int notify)
+{
+	unsigned long flags;
+
+	switch (notify) {
+	case TWA_RESUME:
+		set_notify_resume(task);
+		break;
+	case TWA_SIGNAL:
+		if (lock_task_sighand(task, &flags)) {
+			task->jobctl |= JOBCTL_TASK_WORK;
+			signal_wake_up(task, 0);
+			unlock_task_sighand(task, &flags);
+		}
+		break;
+	}
+}
 
 /**
  * task_work_add - ask the @task to execute @work->func()
@@ -27,29 +47,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
 int
 task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 {
-	struct callback_head *head;
-	unsigned long flags;
+	int ret;
 
-	do {
-		head = READ_ONCE(task->task_works);
-		if (unlikely(head == &work_exited))
-			return -ESRCH;
-		work->next = head;
-	} while (cmpxchg(&task->task_works, head, work) != head);
-
-	switch (notify) {
-	case TWA_RESUME:
-		set_notify_resume(task);
-		break;
-	case TWA_SIGNAL:
-		if (lock_task_sighand(task, &flags)) {
-			task->jobctl |= JOBCTL_TASK_WORK;
-			signal_wake_up(task, 0);
-			unlock_task_sighand(task, &flags);
-		}
-		break;
-	}
+	ret = __task_work_add(task, work);
+	if (unlikely(ret))
+		return ret;
 
+	__task_work_notify(task, notify);
 	return 0;
 }
 
-- 
2.28.0


[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux