On 9/9/21 9:29 AM, Hao Xu wrote: > 在 2021/9/9 下午3:01, Hao Xu 写道: >> 在 2021/9/8 下午8:03, Pavel Begunkov 写道: >>> On 9/8/21 12:21 PM, Hao Xu wrote: >>>> 在 2021/9/7 下午2:48, Hao Xu 写道: >>>>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道: >>>>>> On 9/3/21 12:00 PM, Hao Xu wrote: >>>>>>> For operations like accept, multishot is a useful feature, since we can >>>>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may >>>>>>> be good for other operations in the future. >>>>>>> >>>>>>> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> >>>>>>> --- >>>>>>> fs/io_uring.c | 15 ++++++++++++--- >>>>>>> 1 file changed, 12 insertions(+), 3 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>>>>> index d6df60c4cdb9..dae7044e0c24 100644 >>>>>>> --- a/fs/io_uring.c >>>>>>> +++ b/fs/io_uring.c >>>>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked) >>>>>>> return; >>>>>>> } >>>>>>> - hash_del(&req->hash_node); >>>>>>> - io_poll_remove_double(req); >>>>>>> + if (READ_ONCE(apoll->poll.canceled)) >>>>>>> + apoll->poll.events |= EPOLLONESHOT; >>>>>>> + if (apoll->poll.events & EPOLLONESHOT) { >>>>>>> + hash_del(&req->hash_node); >>>>>>> + io_poll_remove_double(req); >>>>>>> + } else { >>>>>>> + add_wait_queue(apoll->poll.head, &apoll->poll.wait); >>>>>> >>>>>> It looks like it does both io_req_task_submit() and adding back >>>>>> to the wq, so io_issue_sqe() may be called in parallel with >>>>>> io_async_task_func(). If so, there will be tons of all kind of >>>>>> races. >>>>> IMHO, io_async_task_func() is called in original context one by >>>>> one(except PF_EXITING is set, it is also called in system-wq), so >>>>> shouldn't be parallel case there. >>>> ping... >>> >>> fwiw, the case we're talking about: >>> >>> CPU0 | CPU1 >>> io_async_task_func() | >>> -> add_wait_queue(); | >>> -> io_req_task_submit(); | >>> /* no tw run happened in between */ >>> | io_async_task_func() >>> | --> io_req_task_submit() >>> >>> We called io_req_task_submit() twice without running tw in-between, >>> both of the calls use the same req->io_task_work.node field in the >>> request for accounting, and so the second call will screw >>> tctx->task_list and not only by not considering that >>> req->io_task_work.node is already taken/enqueued. >>> >>> io_req_task_work_add() { >>> wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); >>> } >>> >> I guess you mean io_req_task_work_add() called by async_wake() two times: >> io_async_task_func() >> -> add_wait_queue() >> async_wake() >> ->io_req_task_work_add() >> this one mess up the running task_work list >> since req->io_task_work.node is in use. >> >> It seems the current poll_add + multishot logic has this issue too, I'll >> give it a shot(simply clean req->io_task_work.node before running >> req->io_task_work.func should work) > Similar issue for double wait entry since we didn't remove double entry > in interrupt handler: Yep, sounds like that. Polling needs reworking, and not only because of this one. > async_wake() --> io_req_task_work_add() > io_poll_double_wake()-->async_wake()-->io_req_task_work_add() > >>>>>> >>>>>>> + } >>>>>>> + >>>>>>> spin_unlock(&ctx->completion_lock); >>>>>>> if (!READ_ONCE(apoll->poll.canceled)) >>>>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req) >>>>>>> struct io_ring_ctx *ctx = req->ctx; >>>>>>> struct async_poll *apoll; >>>>>>> struct io_poll_table ipt; >>>>>>> - __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI; >>>>>>> + __poll_t ret, mask = POLLERR | POLLPRI; >>>>>>> int rw; >>>>>>> if (!req->file || !file_can_poll(req->file)) >>>>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req) >>>>>>> rw = WRITE; >>>>>>> mask |= POLLOUT | POLLWRNORM; >>>>>>> } >>>>>>> + if (!(req->flags & REQ_F_APOLL_MULTISHOT)) >>>>>>> + mask |= EPOLLONESHOT; >>>>>>> /* if we can't nonblock try, then no point in arming a poll handler */ >>>>>>> if (!io_file_supports_nowait(req, rw)) >>>>>>> >>>>>> >>>> >>> > -- Pavel Begunkov