Re: [PATCH for-next 1/4] io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async

Jens Axboe <axboe@xxxxxxxxx> · Mon, 30 Jan 2023 08:53:58 -0700

On 1/30/23 3:45 AM, Dylan Yudaken wrote:
> On Sun, 2023-01-29 at 16:17 -0700, Jens Axboe wrote:
>> On 1/29/23 3:57 PM, Jens Axboe wrote:
>>> On 1/27/23 6:52?AM, Dylan Yudaken wrote:
>>>> REQ_F_FORCE_ASYNC was being ignored for re-queueing linked
>>>> requests. Instead obey that flag.
>>>>
>>>> Signed-off-by: Dylan Yudaken <dylany@xxxxxxxx>
>>>> ---
>>>>  io_uring/io_uring.c | 8 +++++---
>>>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> index db623b3185c8..980ba4fda101 100644
>>>> --- a/io_uring/io_uring.c
>>>> +++ b/io_uring/io_uring.c
>>>> @@ -1365,10 +1365,12 @@ void io_req_task_submit(struct io_kiocb
>>>> *req, bool *locked)
>>>>  {
>>>>         io_tw_lock(req->ctx, locked);
>>>>         /* req->task == current here, checking PF_EXITING is safe
>>>> */
>>>> -       if (likely(!(req->task->flags & PF_EXITING)))
>>>> -               io_queue_sqe(req);
>>>> -       else
>>>> +       if (unlikely(req->task->flags & PF_EXITING))
>>>>                 io_req_defer_failed(req, -EFAULT);
>>>> +       else if (req->flags & REQ_F_FORCE_ASYNC)
>>>> +               io_queue_iowq(req, locked);
>>>> +       else
>>>> +               io_queue_sqe(req);
>>>>  }
>>>>  
>>>>  void io_req_task_queue_fail(struct io_kiocb *req, int ret)
>>>
>>> This one causes a failure for me with test/multicqes_drain.t, which
>>> doesn't quite make sense to me (just yet), but it is a reliable
>>> timeout.
>>
>> OK, quick look and I think this is a bad assumption in the test case.
>> It's assuming that a POLL_ADD already succeeded, and hence that a
>> subsequent POLL_REMOVE will succeed. But now it's getting ENOENT as
>> we can't find it just yet, which means the cancelation itself isn't
>> being done. So we just end up waiting for something that doesn't
>> happen.
>>
>> Or could be an internal race with lookup/issue. In any case, it's
>> definitely being exposed by this patch.
>>
> 
> That is a bit of an unpleasasnt test.
> Essentially it triggers a pipe, and reads from the pipe immediately
> after. The test expects to see a CQE for that trigger, however if
> anything ran asynchronously then there is a race between the read and
> the poll logic running.
> 
> The attached patch fixes the test, but the reason my patches trigger it
> is a bit weird.
> 
> This occurs on the second loop of the test, after the initial drain.
> Essentially ctx->drain_active is still true when the second set of
> polls are added, since drain_active is only cleared inside the next
> io_drain_req. So then the first poll will have REQ_F_FORCE_ASYNC set.
> 
> Previously those FORCE_ASYNC's were being ignored, but now with
> "io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async"
> they get sent to the work thread, which causes the race. 
> 
> I wonder if drain_active should actually be cleared earlier? perhaps
> before setting the REQ_F_FORCE_ASYNC flag?
> The drain logic is pretty complex though, so I am not terribly keen to
> start changing it if it's not generally useful.

Pavel, any input on the drain logic? I think you know that part the
best.

-- 
Jens Axboe