Re: [PATCH 5/9] io_uring: support SQE group

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/7/24 7:03 PM, Ming Lei wrote:
> SQE group is defined as one chain of SQEs starting with the first sqe that
> has IOSQE_EXT_SQE_GROUP set, and ending with the first subsequent sqe that
> doesn't have it set, and it is similar with chain of linked sqes.
> 
> The 1st SQE is group leader, and the other SQEs are group member. The group
> leader is always freed after all members are completed. Group members
> aren't submitted until the group leader is completed, and there isn't any
> dependency among group members, and IOSQE_IO_LINK can't be set for group
> members, same with IOSQE_IO_DRAIN.
> 
> Typically the group leader provides or makes resource, and the other members
> consume the resource, such as scenario of multiple backup, the 1st SQE is to
> read data from source file into fixed buffer, the other SQEs write data from
> the same buffer into other destination files. SQE group provides very
> efficient way to complete this task: 1) fs write SQEs and fs read SQE can be
> submitted in single syscall, no need to submit fs read SQE first, and wait
> until read SQE is completed, 2) no need to link all write SQEs together, then
> write SQEs can be submitted to files concurrently. Meantime application is
> simplified a lot in this way.
> 
> Another use case is to for supporting generic device zero copy:
> 
> - the lead SQE is for providing device buffer, which is owned by device or
>   kernel, can't be cross userspace, otherwise easy to cause leak for devil
>   application or panic
> 
> - member SQEs reads or writes concurrently against the buffer provided by lead
>   SQE

In concept, this looks very similar to "sqe bundles" that I played with
in the past:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-bundle

Didn't look too closely yet at the implementation, but in spirit it's
about the same in that the first entry is processed first, and there's
no ordering implied between the test of the members of the bundle /
group.

I do think that's a flexible thing to support, particularly if:

1) We can do it more efficiently than links, which are pretty horrible.
2) It enables new worthwhile use cases
3) It's done cleanly 
4) It's easily understandable and easy to document, so that users will
   actually understand what this is and what use cases it enable. Part
   of that is actually naming, it should be readily apparent what a
   group is, what the lead is, and what the members are. Using your
   terminology here, definitely worth spending some time on that to get
   it just right and self evident.

-- 
Jens Axboe





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux