[PATCH 2/3] io_uring: maintain drain logic for multishot requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Now that we have multishot poll requests, one sqe can emit multiple
cqes. given below example:
    sqe0(multishot poll)-->sqe1-->sqe2(drain req)
sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
is a multishot poll request, sqe2 may be issued after sqe0's event
triggered twice before sqe1 completed. This isn't what users leverage
drain requests for.
Here a simple solution is to ignore all multishot poll cqes, which means
drain requests won't wait those request to be done.
To achieve this, we should reconsider the req_need_defer equation, the
original one is:

    all_sqes(excluding dropped ones) == all_cqes(including dropped ones)

this means we issue a drain request when all the previous submitted
sqes have generated their cqes.
Now we should ignore multishot requests, so:
    all_sqes - multishot_sqes == all_cqes - multishot_cqes ==>
    all_sqes + multishot_cqes - multishot_cqes == all_cqes

Thus we have to track the submittion of a multishot request and the cqes
generation of it, including the ECANCELLED cqes. Here we introduce
cq_extra = multishot_cqes - multishot_cqes for it.

There are other solutions like:
  - just track multishot (non-ECNCELLED)cqes, don't track multishot sqes.
      this way we include multishot sqes in the left end of the equation
      this means we have to see multishot sqes as normal ones, then we
      have to keep right one cqe for each multishot sqe. It's hard to do
      this since there may be some multishot sqes which triggered
      several events and then was cancelled, meanwhile other multishot
      sqes just triggered events but wasn't cancelled. We still need to
      track number of multishot sqes that haven't been cancelled, which
      make things complicated

For implementations, just do the submittion tracking in
io_submit_sqe() --> io_init_req() to make things simple. Otherwise if
we do it in per opcode issue place, then we need to carefully consider
each caller of io_req_complete_failed() because trick cases like cancel
multishot reqs in link.

Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx>
---
 fs/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 192463bb977a..a7bd223ce2cc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -423,6 +423,7 @@ struct io_ring_ctx {
 		unsigned		cq_mask;
 		atomic_t		cq_timeouts;
 		unsigned		cq_last_tm_flush;
+		unsigned		cq_extra;
 		unsigned long		cq_check_overflow;
 		struct wait_queue_head	cq_wait;
 		struct fasync_struct	*cq_fasync;
@@ -879,6 +880,8 @@ struct io_op_def {
 	unsigned		needs_async_setup : 1;
 	/* should block plug */
 	unsigned		plug : 1;
+	/* set if opcode may generate multiple cqes */
+	unsigned		multi_cqes : 1;
 	/* size of async data needed, if any */
 	unsigned short		async_size;
 };
@@ -924,6 +927,7 @@ struct io_op_def {
 	[IORING_OP_POLL_ADD] = {
 		.needs_file		= 1,
 		.unbound_nonreg_file	= 1,
+		.multi_cqes		= 1,
 	},
 	[IORING_OP_POLL_REMOVE] = {},
 	[IORING_OP_SYNC_FILE_RANGE] = {
@@ -1186,7 +1190,7 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
 	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
 		struct io_ring_ctx *ctx = req->ctx;
 
-		return seq != ctx->cached_cq_tail
+		return seq + ctx->cq_extra != ctx->cached_cq_tail
 				+ READ_ONCE(ctx->cached_cq_overflow);
 	}
 
@@ -1516,6 +1520,9 @@ static bool __io_cqring_fill_event(struct io_kiocb *req, long res,
 
 	trace_io_uring_complete(ctx, req->user_data, res, cflags);
 
+	if (req->flags & REQ_F_MULTI_CQES)
+		req->ctx->cq_extra++;
+
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
 	 * submission (by quite a lot). Increment the overflow count in
@@ -6504,6 +6511,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	req->result = 0;
 	req->work.creds = NULL;
 
+	if (sqe_flags & IOSQE_MULTI_CQES) {
+		ctx->cq_extra--;
+		if (!io_op_defs[req->opcode].multi_cqes) {
+			return -EOPNOTSUPP;
+		}
+	}
+
 	/* enforce forwards compatibility on users */
 	if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) {
 		req->flags = 0;
-- 
1.8.3.1




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux