On 12/4/22 7:44 PM, Pavel Begunkov wrote: > Optimise CQ locking for event posting depending on a number of ring setup flags. > QD1 nop benchmark showed 12.067 -> 12.565 MIOPS increase, which more than 8.5% > of the io_uring kernel overhead (taking into account that the syscall overhead > is over 50%) or 4.12% of the total performance. Naturally, it's not only about > QD1, applications can submit a bunch of requests but their completions will may > arrive randomly hurting batching and so performance (or latency). > > The downside is that we have to punt all io-wq completions to the > original task. The performance win should diminish with better > completion batching, but it should be worth it for as it also helps tw, > which in reality often don't complete too many requests. > > The feature depends on DEFER_TASKRUN but can be relaxed to SINGLE_ISSUER Let's hash out the details for MSG_RING later, if we have to. -- Jens Axboe