On 8/17/24 05:16, Ming Lei wrote:
On Fri, Aug 9, 2024 at 12:25 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
Hello,
The 1st 3 patches are cleanup, and prepare for adding sqe group.
The 4th patch supports generic sqe group which is like link chain, but
allows each sqe in group to be issued in parallel and the group shares
same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with
sqe group & io link together. sqe group changes nothing on
IOSQE_IO_LINK.
The 5th patch supports one variant of sqe group: allow members to depend
on group leader, so that kernel resource lifetime can be aligned with
group leader or group, then any kernel resource can be shared in this
sqe group, and can be used in generic device zero copy.
The 6th & 7th patches supports providing sqe group buffer via the sqe
group variant.
The 8th patch supports ublk zero copy based on io_uring providing sqe
group buffer.
Tests:
1) pass liburing test
- make runtests
2) write/pass two sqe group test cases:
https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2
- covers related sqe flags combination and linking groups, both nop and
one multi-destination file copy.
- cover failure handling test: fail leader IO or member IO in both single
group and linked groups, which is done in each sqe flags combination
test
3) ublksrv zero copy:
ublksrv userspace implements zero copy by sqe group & provide group
kbuf:
git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2
make test T=loop/009:nbd/061 #ublk zc tests
When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'),
it is observed that perf is doubled.
Any comments are welcome!
V5:
- follow Pavel's suggestion to minimize change on io_uring fast code
path: sqe group code is called in by single 'if (unlikely())' from
both issue & completion code path
- simplify & re-write group request completion
avoid to touch io-wq code by completing group leader via tw
directly, just like ->task_complete
re-write group member & leader completion handling, one
simplification is always to free leader via the last member
simplify queueing group members, not support issuing leader
and members in parallel
- fail the whole group if IO_*LINK & IO_DRAIN is set on group
members, and test code to cover this change
- misc cleanup
Hi Pavel,
V5 should address all your comments on V4, so care to take a look?
I will, didn't forget about it, but thanks for the reminder.
--
Pavel Begunkov