On Thu, Aug 8, 2024 at 9:24 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > Hello, > > The 1st 3 patches are cleanup, and prepare for adding sqe group. > > The 4th patch supports generic sqe group which is like link chain, but > allows each sqe in group to be issued in parallel and the group shares > same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with > sqe group & io link together. sqe group changes nothing on > IOSQE_IO_LINK. > > The 5th patch supports one variant of sqe group: allow members to depend > on group leader, so that kernel resource lifetime can be aligned with > group leader or group, then any kernel resource can be shared in this > sqe group, and can be used in generic device zero copy. > > The 6th & 7th patches supports providing sqe group buffer via the sqe > group variant. > > The 8th patch supports ublk zero copy based on io_uring providing sqe > group buffer. > Hi Ming, Thanks for working on this feature. I have tested this entire v5 series for the Android OTA path to evaluate ublk zero copy. Tested-by: Akilesh Kailash <akailash@xxxxxxxxxx> > Tests: > > 1) pass liburing test > - make runtests > > 2) write/pass two sqe group test cases: > > https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2 > > - covers related sqe flags combination and linking groups, both nop and > one multi-destination file copy. > > - cover failure handling test: fail leader IO or member IO in both single > group and linked groups, which is done in each sqe flags combination > test > > 3) ublksrv zero copy: > > ublksrv userspace implements zero copy by sqe group & provide group > kbuf: > > git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2 > make test T=loop/009:nbd/061 #ublk zc tests > > When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'), > it is observed that perf is doubled. > > Any comments are welcome! > > V5: > - follow Pavel's suggestion to minimize change on io_uring fast code > path: sqe group code is called in by single 'if (unlikely())' from > both issue & completion code path > > - simplify & re-write group request completion > avoid to touch io-wq code by completing group leader via tw > directly, just like ->task_complete > > re-write group member & leader completion handling, one > simplification is always to free leader via the last member > > simplify queueing group members, not support issuing leader > and members in parallel > > - fail the whole group if IO_*LINK & IO_DRAIN is set on group > members, and test code to cover this change > > - misc cleanup > > V4: > - address most comments from Pavel > - fix request double free > - don't use io_req_commit_cqe() in io_req_complete_defer() > - make members' REQ_F_INFLIGHT discoverable > - use common assembling check in submission code path > - drop patch 3 and don't move REQ_F_CQE_SKIP out of io_free_req() > - don't set .accept_group_kbuf for net send zc, in which members > need to be queued after buffer notification is got, and can be > enabled in future > - add .grp_leader field via union, and share storage with .grp_link > - move .grp_refs into one hole of io_kiocb, so that one extra > cacheline isn't needed for io_kiocb > - cleanup & document improvement > > V3: > - add IORING_FEAT_SQE_GROUP > - simplify group completion, and minimize change on io_req_complete_defer() > - simplify & cleanup io_queue_group_members() > - fix many failure handling issues > - cover failure handling code in added liburing tests > - remove RFC > > V2: > - add generic sqe group, suggested by Kevin Wolf > - add REQ_F_SQE_GROUP_DEP which is based on IOSQE_SQE_GROUP, for sharing > kernel resource in group wide, suggested by Kevin Wolf > - remove sqe ext flag, and use the last bit for IOSQE_SQE_GROUP(Pavel), > in future we still can extend sqe flags with one uring context flag > - initialize group requests via submit state pattern, suggested by Pavel > - all kinds of cleanup & bug fixes > > Ming Lei (8): > io_uring: add io_link_req() helper > io_uring: add io_submit_fail_link() helper > io_uring: add helper of io_req_commit_cqe() > io_uring: support SQE group > io_uring: support sqe group with members depending on leader > io_uring: support providing sqe group buffer > io_uring/uring_cmd: support provide group kernel buffer > ublk: support provide io buffer > > drivers/block/ublk_drv.c | 160 ++++++++++++++- > include/linux/io_uring/cmd.h | 7 + > include/linux/io_uring_types.h | 54 +++++ > include/uapi/linux/io_uring.h | 11 +- > include/uapi/linux/ublk_cmd.h | 7 +- > io_uring/io_uring.c | 359 ++++++++++++++++++++++++++++++--- > io_uring/io_uring.h | 16 ++ > io_uring/kbuf.c | 60 ++++++ > io_uring/kbuf.h | 13 ++ > io_uring/net.c | 23 ++- > io_uring/opdef.c | 4 + > io_uring/opdef.h | 2 + > io_uring/rw.c | 20 +- > io_uring/timeout.c | 2 + > io_uring/uring_cmd.c | 28 +++ > 15 files changed, 720 insertions(+), 46 deletions(-) > > -- > 2.42.0 > >