On 01/04/2021 07:53, Hao Xu wrote: > 在 2021/4/1 上午6:06, Pavel Begunkov 写道: >> >> >> On 31/03/2021 10:01, Hao Xu wrote: >>> Now that we have multishot poll requests, one sqe can emit multiple >>> cqes. given below example: >>> sqe0(multishot poll)-->sqe1-->sqe2(drain req) >>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0 >>> is a multishot poll request, sqe2 may be issued after sqe0's event >>> triggered twice before sqe1 completed. This isn't what users leverage >>> drain requests for. >>> Here a simple solution is to ignore all multishot poll cqes, which means >>> drain requests won't wait those request to be done. >>> >>> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> >>> --- >>> fs/io_uring.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 513096759445..cd6d44cf5940 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -455,6 +455,7 @@ struct io_ring_ctx { >>> struct callback_head *exit_task_work; >>> struct wait_queue_head hash_wait; >>> + unsigned multishot_cqes; >>> /* Keep this last, we don't need it for the fast path */ >>> struct work_struct exit_work; >>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq) >>> if (unlikely(req->flags & REQ_F_IO_DRAIN)) { >>> struct io_ring_ctx *ctx = req->ctx; >>> - return seq != ctx->cached_cq_tail >>> - + READ_ONCE(ctx->cached_cq_overflow); >>> + return seq + ctx->multishot_cqes != ctx->cached_cq_tail >>> + + READ_ONCE(ctx->cached_cq_overflow); >>> } >>> return false; >>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error) >>> { >>> struct io_ring_ctx *ctx = req->ctx; >>> unsigned flags = IORING_CQE_F_MORE; >>> + bool multishot_poll = !(req->poll.events & EPOLLONESHOT); >>> if (!error && req->poll.canceled) { >>> error = -ECANCELED; >>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error) >>> req->poll.done = true; >>> flags = 0; >>> } >>> + if (multishot_poll) >>> + ctx->multishot_cqes++; >>> + >> >> We need to make sure we do that only for a non-final complete, i.e. >> not killing request, otherwise it'll double account the last one. > Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request): > (1). add poll --> cancel poll, an ECANCELED cqe. > 1sqe:1cqe all good > (2). add poll --> trigger event(queued to task_work) --> cancel poll, an ECANCELED cqe --> task_work runs, another ECANCELED cqe. > 1sqe:2cqes Those should emit a CQE on behalf of the request they're cancelling only when it's definitely cancelled and not going to fill it itself. E.g. if io_poll_cancel() found it and removed from all the list and core's poll infra. At least before multi-cqe it should have been working fine. > I suggest we shall only emit one ECANCELED cqe. > Currently I only account cqe through io_poll_complete(), so ECANCELED cqe from io_poll_remove or async_cancel etc are not counted in. >> E.g. is failed __io_cqring_fill_event() in io_poll_complete() fine? >> Other places? > a failed __io_cqring_fill_event() doesn't produce a cqe but increment ctx->cached_cq_overflow, as long as a cqe is produced or cached_cq_overflow is +=1, it is ok. Not claiming that the case is broken, but cached_cq_overflow is considered in req_need_defer() as well, so from its perspective there is no much difference between succeed fill_event() or not. >> >> Btw, we can use some tests :) > I'll do more tests. Perfect! >> >> >>> io_commit_cqring(ctx); >>> return !(flags & IORING_CQE_F_MORE); >>> } >>> >> > -- Pavel Begunkov