On 2/24/20 5:39 PM, Pavel Begunkov wrote: > I've got curious about performance of the idea of having only 1 CQE per link > (for the failed or last one). Tested it with a quick dirty patch doing > submit-and-reap of a nops-link (patched for inline execution). > > 1) link size: 100 > old: 206 ns per nop > new: 144 ns per nop > > 2) link size: 10 > old: 234 ns per nop > new: 181 ns per nop > > 3) link size: 10, FORCE_ASYNC > old: 667 ns per nop > new: 569 ns per nop > > > The patch below breaks sequences, linked_timeout and who knows what else. > The first one requires synchronisation/atomic, so it's a bit in the way. I've > been wondering, whether IOSQE_IO_DRAIN is popular and how much it's used. We can > try to find tradeoff or even disable it with this feature. For a more realistic workload, I can try and run a random read workload on a fast device. If I just make the QD the link count, then we'll have the same amount in parallel, just with link-depth ratio less CQEs. I'd be curious to see what that does. -- Jens Axboe