On 1/30/23 3:45 AM, Dylan Yudaken wrote: > On Sun, 2023-01-29 at 16:17 -0700, Jens Axboe wrote: >> On 1/29/23 3:57 PM, Jens Axboe wrote: >>> On 1/27/23 6:52?AM, Dylan Yudaken wrote: >>>> REQ_F_FORCE_ASYNC was being ignored for re-queueing linked >>>> requests. Instead obey that flag. >>>> >>>> Signed-off-by: Dylan Yudaken <dylany@xxxxxxxx> >>>> --- >>>> io_uring/io_uring.c | 8 +++++--- >>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >>>> index db623b3185c8..980ba4fda101 100644 >>>> --- a/io_uring/io_uring.c >>>> +++ b/io_uring/io_uring.c >>>> @@ -1365,10 +1365,12 @@ void io_req_task_submit(struct io_kiocb >>>> *req, bool *locked) >>>> { >>>> io_tw_lock(req->ctx, locked); >>>> /* req->task == current here, checking PF_EXITING is safe >>>> */ >>>> - if (likely(!(req->task->flags & PF_EXITING))) >>>> - io_queue_sqe(req); >>>> - else >>>> + if (unlikely(req->task->flags & PF_EXITING)) >>>> io_req_defer_failed(req, -EFAULT); >>>> + else if (req->flags & REQ_F_FORCE_ASYNC) >>>> + io_queue_iowq(req, locked); >>>> + else >>>> + io_queue_sqe(req); >>>> } >>>> >>>> void io_req_task_queue_fail(struct io_kiocb *req, int ret) >>> >>> This one causes a failure for me with test/multicqes_drain.t, which >>> doesn't quite make sense to me (just yet), but it is a reliable >>> timeout. >> >> OK, quick look and I think this is a bad assumption in the test case. >> It's assuming that a POLL_ADD already succeeded, and hence that a >> subsequent POLL_REMOVE will succeed. But now it's getting ENOENT as >> we can't find it just yet, which means the cancelation itself isn't >> being done. So we just end up waiting for something that doesn't >> happen. >> >> Or could be an internal race with lookup/issue. In any case, it's >> definitely being exposed by this patch. >> > > That is a bit of an unpleasasnt test. > Essentially it triggers a pipe, and reads from the pipe immediately > after. The test expects to see a CQE for that trigger, however if > anything ran asynchronously then there is a race between the read and > the poll logic running. > > The attached patch fixes the test, but the reason my patches trigger it > is a bit weird. > > This occurs on the second loop of the test, after the initial drain. > Essentially ctx->drain_active is still true when the second set of > polls are added, since drain_active is only cleared inside the next > io_drain_req. So then the first poll will have REQ_F_FORCE_ASYNC set. > > Previously those FORCE_ASYNC's were being ignored, but now with > "io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async" > they get sent to the work thread, which causes the race. > > I wonder if drain_active should actually be cleared earlier? perhaps > before setting the REQ_F_FORCE_ASYNC flag? > The drain logic is pretty complex though, so I am not terribly keen to > start changing it if it's not generally useful. Pavel, any input on the drain logic? I think you know that part the best. -- Jens Axboe