On Fri, Aug 9, 2024 at 12:25 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > Hello, > > The 1st 3 patches are cleanup, and prepare for adding sqe group. > > The 4th patch supports generic sqe group which is like link chain, but > allows each sqe in group to be issued in parallel and the group shares > same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with > sqe group & io link together. sqe group changes nothing on > IOSQE_IO_LINK. > > The 5th patch supports one variant of sqe group: allow members to depend > on group leader, so that kernel resource lifetime can be aligned with > group leader or group, then any kernel resource can be shared in this > sqe group, and can be used in generic device zero copy. > > The 6th & 7th patches supports providing sqe group buffer via the sqe > group variant. > > The 8th patch supports ublk zero copy based on io_uring providing sqe > group buffer. > > Tests: > > 1) pass liburing test > - make runtests > > 2) write/pass two sqe group test cases: > > https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2 > > - covers related sqe flags combination and linking groups, both nop and > one multi-destination file copy. > > - cover failure handling test: fail leader IO or member IO in both single > group and linked groups, which is done in each sqe flags combination > test > > 3) ublksrv zero copy: > > ublksrv userspace implements zero copy by sqe group & provide group > kbuf: > > git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2 > make test T=loop/009:nbd/061 #ublk zc tests > > When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'), > it is observed that perf is doubled. > > Any comments are welcome! > > V5: > - follow Pavel's suggestion to minimize change on io_uring fast code > path: sqe group code is called in by single 'if (unlikely())' from > both issue & completion code path > > - simplify & re-write group request completion > avoid to touch io-wq code by completing group leader via tw > directly, just like ->task_complete > > re-write group member & leader completion handling, one > simplification is always to free leader via the last member > > simplify queueing group members, not support issuing leader > and members in parallel > > - fail the whole group if IO_*LINK & IO_DRAIN is set on group > members, and test code to cover this change > > - misc cleanup Hi Pavel, V5 should address all your comments on V4, so care to take a look? Thanks,