Re: [RFC] single cqe per link

Jens Axboe <axboe@xxxxxxxxx> · Mon, 24 Feb 2020 19:24:09 -0700

On 2/24/20 5:39 PM, Pavel Begunkov wrote:
> I've got curious about performance of the idea of having only 1 CQE per link
> (for the failed or last one). Tested it with a quick dirty patch doing
> submit-and-reap of a nops-link (patched for inline execution).
> 
> 1) link size: 100
> old: 206 ns per nop
> new: 144 ns per nop
> 
> 2) link size: 10
> old: 234 ns per nop
> new: 181 ns per nop
> 
> 3) link size: 10, FORCE_ASYNC
> old: 667 ns per nop
> new: 569 ns per nop
> 
> 
> The patch below breaks sequences, linked_timeout and who knows what else.
> The first one requires synchronisation/atomic, so it's a bit in the way. I've
> been wondering, whether IOSQE_IO_DRAIN is popular and how much it's used. We can
> try to find tradeoff or even disable it with this feature.

For a more realistic workload, I can try and run a random read workload
on a fast device. If I just make the QD the link count, then we'll
have the same amount in parallel, just with link-depth ratio less
CQEs. I'd be curious to see what that does.

-- 
Jens Axboe