On Sat, May 11, 2024 at 08:12:08AM +0800, Ming Lei wrote: > SQE group is defined as one chain of SQEs starting with the first SQE that > has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that > doesn't have it set, and it is similar with chain of linked SQEs. > > Not like linked SQEs, each sqe is issued after the previous one is completed. > All SQEs in one group are submitted in parallel, so there isn't any dependency > among SQEs in one group. > > The 1st SQE is group leader, and the other SQEs are group member. The whole > group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and > the two flags are ignored for group members. > > When the group is in one link chain, this group isn't submitted until the > previous SQE or group is completed. And the following SQE or group can't > be started if this group isn't completed. Failure from any group member will > fail the group leader, then the link chain can be terminated. > > When IOSQE_IO_DRAIN is set for group leader, all requests in this group and > previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for > group leader only, we respect IO_DRAIN by always completing group leader as > the last one in the group. > > Working together with IOSQE_IO_LINK, SQE group provides flexible way to > support N:M dependency, such as: > > - group A is chained with group B together > - group A has N SQEs > - group B has M SQEs > > then M SQEs in group B depend on N SQEs in group A. > > N:M dependency can support some interesting use cases in efficient way: > > 1) read from multiple files, then write the read data into single file > > 2) read from single file, and write the read data into multiple files > > 3) write same data into multiple files, and read data from multiple files and > compare if correct data is written > > Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can > extend sqe->flags with one uring context flag, such as use __pad3 for > non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP. > > Suggested-by: Kevin Wolf <kwolf@xxxxxxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> BTW, I wrote one link-grp-cp.c liburing/example which is based on sqe group, and keep QD not changed, just re-organize IOs in the following ways: - each group have 4 READ IOs, linked by one single write IO for writing the read data in sqe group to destination file - the 1st 12 groups have (4 + 1) IOs, and the last group have (3 + 1) IOs Run the example for copying two block device(from virtio-blk to virtio-scsi in my test VM): 1) buffered copy: - perf is improved by 5% 2) direct IO mode - perf is improved by 27% [1] link-grp-cp.c example https://github.com/ming1/liburing/commits/sqe_group_v2/ [2] one bug fixes(top commit) against V3 https://github.com/ming1/linux/commits/io_uring_sqe_group_v3/ Thanks, Ming