Re: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't running

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/10/20 9:02 AM, Jens Axboe wrote:
> On 8/10/20 5:42 AM, peterz@xxxxxxxxxxxxx wrote:
>> On Sat, Aug 08, 2020 at 12:34:39PM -0600, Jens Axboe wrote:
>>> An earlier commit:
>>>
>>> b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")
>>>
>>> ensured that we didn't get stuck waiting for eventfd reads when it's
>>> registered with the io_uring ring for event notification, but we still
>>> have a gap where the task can be waiting on other events in the kernel
>>> and need a bigger nudge to make forward progress.
>>>
>>> Ensure that we use signaled notifications for a task that isn't currently
>>> running, to be certain the work is seen and processed immediately.
>>>
>>> Cc: stable@xxxxxxxxxxxxxxx # v5.7+
>>> Reported-by: Josef <josef.grieb@xxxxxxxxx>
>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>> ---
>>>  fs/io_uring.c | 22 ++++++++++++++--------
>>>  1 file changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index e9b27cdaa735..443eecdfeda9 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1712,21 +1712,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
>>>  	struct io_ring_ctx *ctx = req->ctx;
>>>  	int ret, notify = TWA_RESUME;
>>>  
>>> +	ret = __task_work_add(tsk, cb);
>>> +	if (unlikely(ret))
>>> +		return ret;
>>> +
>>>  	/*
>>>  	 * SQPOLL kernel thread doesn't need notification, just a wakeup.
>>> -	 * If we're not using an eventfd, then TWA_RESUME is always fine,
>>> -	 * as we won't have dependencies between request completions for
>>> -	 * other kernel wait conditions.
>>> +	 * For any other work, use signaled wakeups if the task isn't
>>> +	 * running to avoid dependencies between tasks or threads. If
>>> +	 * the issuing task is currently waiting in the kernel on a thread,
>>> +	 * and same thread is waiting for a completion event, then we need
>>> +	 * to ensure that the issuing task processes task_work. TWA_SIGNAL
>>> +	 * is needed for that.
>>>  	 */
>>>  	if (ctx->flags & IORING_SETUP_SQPOLL)
>>>  		notify = 0;
>>> -	else if (ctx->cq_ev_fd)
>>> +	else if (READ_ONCE(tsk->state) != TASK_RUNNING)
>>>  		notify = TWA_SIGNAL;
>>>  
>>> -	ret = task_work_add(tsk, cb, notify);
>>> -	if (!ret)
>>> -		wake_up_process(tsk);
>>> -	return ret;
>>> +	__task_work_notify(tsk, notify);
>>> +	wake_up_process(tsk);
>>> +	return 0;
>>>  }
>>
>> Wait.. so the only change here is that you look at tsk->state, _after_
>> doing __task_work_add(), but nothing, not the Changelog nor the comment
>> explains this.
>>
>> So you're relying on __task_work_add() being an smp_mb() vs the add, and
>> you order this against the smp_mb() in set_current_state() ?
>>
>> This really needs spelling out.
> 
> I'll update the changelog, it suffers a bit from having been reused from
> the earlier versions. Thanks for checking!

I failed to convince myself that the existing construct was safe, so
here's an incremental on top of that. Basically we re-check the task
state _after_ the initial notification, to protect ourselves from the
case where we initially find the task running, but between that check
and when we do the notification, it's now gone to sleep. Should be
pretty slim, but I think it's there.

Hence do a loop around it, if we're using TWA_RESUME.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 44ac103483b6..a4ecb6c7e2b0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1780,12 +1780,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
 	 * to ensure that the issuing task processes task_work. TWA_SIGNAL
 	 * is needed for that.
 	 */
-	if (ctx->flags & IORING_SETUP_SQPOLL)
+	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		notify = 0;
-	else if (READ_ONCE(tsk->state) != TASK_RUNNING)
-		notify = TWA_SIGNAL;
+	} else {
+		bool notified = false;
 
-	__task_work_notify(tsk, notify);
+		/*
+		 * If the task is running, TWA_RESUME notify is enough. Make
+		 * sure to re-check after we've sent the notification, as not
+		 * to have a race between the check and the notification. This
+		 * only applies for TWA_RESUME, as TWA_SIGNAL is safe with a
+		 * sleeping task
+		 */
+		do {
+			if (READ_ONCE(tsk->state) != TASK_RUNNING)
+				notify = TWA_SIGNAL;
+			else if (notified)
+				break;
+			__task_work_notify(tsk, notify);
+			notified = true;
+		} while (notify != TWA_SIGNAL);
+	}
 	wake_up_process(tsk);
 	return 0;
 }

and I've folded it in here:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=8d685b56f80b16516be9ce2eb1aee5adcfba13ff

-- 
Jens Axboe




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux