Re: [PATCH V3 5/9] io_uring: support SQE group

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 10, 2024 at 02:55:22AM +0100, Pavel Begunkov wrote:
> On 5/21/24 03:58, Ming Lei wrote:
> > On Sat, May 11, 2024 at 08:12:08AM +0800, Ming Lei wrote:
> > > SQE group is defined as one chain of SQEs starting with the first SQE that
> > > has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that
> > > doesn't have it set, and it is similar with chain of linked SQEs.
> > > 
> > > Not like linked SQEs, each sqe is issued after the previous one is completed.
> > > All SQEs in one group are submitted in parallel, so there isn't any dependency
> > > among SQEs in one group.
> > > 
> > > The 1st SQE is group leader, and the other SQEs are group member. The whole
> > > group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
> > > the two flags are ignored for group members.
> > > 
> > > When the group is in one link chain, this group isn't submitted until the
> > > previous SQE or group is completed. And the following SQE or group can't
> > > be started if this group isn't completed. Failure from any group member will
> > > fail the group leader, then the link chain can be terminated.
> > > 
> > > When IOSQE_IO_DRAIN is set for group leader, all requests in this group and
> > > previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for
> > > group leader only, we respect IO_DRAIN by always completing group leader as
> > > the last one in the group.
> > > 
> > > Working together with IOSQE_IO_LINK, SQE group provides flexible way to
> > > support N:M dependency, such as:
> > > 
> > > - group A is chained with group B together
> > > - group A has N SQEs
> > > - group B has M SQEs
> > > 
> > > then M SQEs in group B depend on N SQEs in group A.
> > > 
> > > N:M dependency can support some interesting use cases in efficient way:
> > > 
> > > 1) read from multiple files, then write the read data into single file
> > > 
> > > 2) read from single file, and write the read data into multiple files
> > > 
> > > 3) write same data into multiple files, and read data from multiple files and
> > > compare if correct data is written
> > > 
> > > Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can
> > > extend sqe->flags with one uring context flag, such as use __pad3 for
> > > non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP.
> > > 
> > > Suggested-by: Kevin Wolf <kwolf@xxxxxxxxxx>
> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > 
> > BTW, I wrote one link-grp-cp.c liburing/example which is based on sqe group,
> > and keep QD not changed, just re-organize IOs in the following ways:
> > 
> > - each group have 4 READ IOs, linked by one single write IO for writing
> >    the read data in sqe group to destination file
> 
> IIUC it's comparing 1 large write request with 4 small, and

It is actually reasonable from storage device viewpoint, concurrent
small READs are often fast than single big READ, but concurrent small
writes are usually slower.

> it's not exactly anything close to fair. And you can do same
> in userspace (without links). And having control in userspace

No, you can't do it with single syscall.


Thanks, 
Ming





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux