IOSQE_SQE_GROUP just starts to queue members after the leader is completed, which way is just for simplifying implementation, and this behavior is never part of UAPI, and it may be relaxed and members can be queued concurrently with leader in future. However, some resource can't cross OPs, such as kernel buffer, otherwise the buffer may be leaked easily in case that any OP failure or application panic. Add flag REQ_F_SQE_GROUP_DEP for allowing members to depend on group leader explicitly, so that group members won't be queued until the leader request is completed, and we still commit leader's CQE after all members CQE are posted. With this way, the kernel resource lifetime can be aligned with group leader or group, one typical use case is to support zero copy for device internal buffer. Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> --- include/linux/io_uring_types.h | 3 +++ io_uring/io_uring.c | 8 +++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c5250e585289..d0972e2a098f 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -469,6 +469,7 @@ enum { REQ_F_BL_NO_RECYCLE_BIT, REQ_F_BUFFERS_COMMIT_BIT, REQ_F_SQE_GROUP_LEADER_BIT, + REQ_F_SQE_GROUP_DEP_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -551,6 +552,8 @@ enum { REQ_F_BUFFERS_COMMIT = IO_REQ_FLAG(REQ_F_BUFFERS_COMMIT_BIT), /* sqe group lead */ REQ_F_SQE_GROUP_LEADER = IO_REQ_FLAG(REQ_F_SQE_GROUP_LEADER_BIT), + /* sqe group with members depending on leader */ + REQ_F_SQE_GROUP_DEP = IO_REQ_FLAG(REQ_F_SQE_GROUP_DEP_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 45a292445b18..b4f5dac85fa4 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -982,7 +982,13 @@ static void io_complete_group_leader(struct io_kiocb *req) req->grp_refs -= 1; WARN_ON_ONCE(req->grp_refs == 0); - /* TODO: queue members with leader in parallel */ + /* + * TODO: queue members with leader in parallel + * + * So far, REQ_F_SQE_GROUP_DEP depends that members are queued + * after leader is completed, which may be changed in future, + * then REQ_F_SQE_GROUP_DEP has to be respected in another way. + */ if (req->grp_link) io_queue_group_members(req); } -- 2.42.0