On 2/24/20 7:36 PM, Jens Axboe wrote: > On 2/24/20 7:14 PM, Carter Li 李通洲 wrote: >>> 2020年2月25日 上午8:39,Pavel Begunkov <asml.silence@xxxxxxxxx> 写道: >>> >>> I've got curious about performance of the idea of having only 1 CQE per link >>> (for the failed or last one). Tested it with a quick dirty patch doing >>> submit-and-reap of a nops-link (patched for inline execution). >>> >>> 1) link size: 100 >>> old: 206 ns per nop >>> new: 144 ns per nop >>> >>> 2) link size: 10 >>> old: 234 ns per nop >>> new: 181 ns per nop >>> >>> 3) link size: 10, FORCE_ASYNC >>> old: 667 ns per nop >>> new: 569 ns per nop >>> >>> >>> The patch below breaks sequences, linked_timeout and who knows what else. >>> The first one requires synchronisation/atomic, so it's a bit in the way. I've >>> been wondering, whether IOSQE_IO_DRAIN is popular and how much it's used. We can >>> try to find tradeoff or even disable it with this feature. >> >> Hello Pavel, >> >> I still think flags tagged on sqes could be a better choice, which >> gives users an ability to deside if they want to ignore the cqes, not >> only for links, but also for normal sqes. >> >> In addition, boxed cqes couldn’t resolve the issue of >> IORING_IO_TIMEOUT. > > I would tend to agree, and it'd be trivial to just set the flag on > whatever SQEs in the chain you don't care about. Or even an individual > SQE, though that's probably a bit more of a reach in terms of use case. > Maybe nop with drain + ignore? > > In any case it's definitely more flexible. In the interest of taking this to the extreme, I tried a nop benchmark on my laptop (qemu/kvm). Granted, this setup is particularly sensitive to spinlocks, they are a lot more expensive there than on a real host. Anyway, regular nops run at about 9.5M/sec with a single thread. Flagging all SQEs with IOSQE_NO_CQE nets me about 14M/sec. So a handy improvement. Looking at the top of profiles: cqe-per-sqe: + 28.45% io_uring [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore + 14.38% io_uring [kernel.kallsyms] [k] io_submit_sqes + 9.38% io_uring [kernel.kallsyms] [k] io_put_req + 7.25% io_uring libc-2.31.so [.] syscall + 6.12% io_uring [kernel.kallsyms] [k] kmem_cache_free no-cqes: + 19.72% io_uring [kernel.kallsyms] [k] io_put_req + 11.93% io_uring [kernel.kallsyms] [k] io_submit_sqes + 10.14% io_uring [kernel.kallsyms] [k] kmem_cache_free + 9.55% io_uring libc-2.31.so [.] syscall + 7.48% io_uring [kernel.kallsyms] [k] __io_queue_sqe I'll try the real disk IO tomorrow, using polled IO. -- Jens Axboe