On Tue, 2022-08-30 at 21:19 +0800, Hao Xu wrote: > On 8/19/22 20:19, Dylan Yudaken wrote: > > Allow deferring async tasks until the user calls io_uring_enter(2) > > with > > the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at > > io_uring_setup time. This functionality requires that the later > > io_uring_enter will be called from the same submission task, and > > therefore > > restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is > > also > > set. > > > > Being able to hand pick when tasks are run prevents the problem > > where > > there is current work to be done, however task work runs anyway. > > > > For example, a common workload would obtain a batch of CQEs, and > > process > > each one. Interrupting this to additional taskwork would add > > latency but > > not gain anything. If instead task work is deferred to just before > > more > > CQEs are obtained then no additional latency is added. > > > > The way this is implemented is by trying to keep task work local to > > a > > io_ring_ctx, rather than to the submission task. This is required, > > as the > > application will want to wake up only a single io_ring_ctx at a > > time to > > process work, and so the lists of work have to be kept separate. > > > > This has some other benefits like not having to check the task > > continually > > in handle_tw_list (and potentially unlocking/locking those), and > > reducing > > locks in the submit & process completions path. > > > > There are networking cases where using this option can reduce > > request > > latency by 50%. For example a contrived example using [1] where the > > client > > sends 2k data and receives the same data back while doing some > > system > > calls (to trigger task work) shows this reduction. The reason ends > > up > > being that if sending responses is delayed by processing task work, > > then > > the client side sits idle. Whereas reordering the sends first means > > that > > the client runs it's workload in parallel with the local task work. > > > > Sorry, seems I misunderstood the purpose of this patchset. Allow me > to > ask a question: "we always first submit sqes then handle task work > (in IORING_SETUP_COOP_TASKRUN mode), how could the sending be > interrupted by task works?" IORING_SETUP_COOP_TASKRUN causes the task to not be interrupted simply for task work, however it will still be run on every system call even if completions are not about to be processed. IoUring task work (unlike say epoll wakeups) can take a non-trivial amount of time, and so running them closer to when they are used can reduce latency of other unrelated operations by not unnecessarily stalling them.