Re: [PATCH 0/4] iopoll support for io_uring/nvme passthrough

Jens Axboe <axboe@xxxxxxxxx> · Fri, 5 Aug 2022 11:18:38 -0600

On 8/5/22 11:04 AM, Jens Axboe wrote:
> On 8/5/22 9:42 AM, Kanchan Joshi wrote:
>> Hi,
>>
>> Series enables async polling on io_uring command, and nvme passthrough
>> (for io-commands) is wired up to leverage that.
>>
>> 512b randread performance (KIOP) below:
>>
>> QD_batch    block    passthru    passthru-poll   block-poll
>> 1_1          80        81          158            157
>> 8_2         406       470          680            700
>> 16_4        620       656          931            920
>> 128_32      879       1056        1120            1132
> 
> Curious on why passthru is slower than block-poll? Are we missing
> something here?

I took a quick peek, running it here. List of items making it slower:

- No fixedbufs support for passthru, each each request will go through
  get_user_pages() and put_pages() on completion. This is about a 10%
  change for me, by itself.

- nvme_uring_cmd_io() -> nvme_alloc_user_request() -> blk_rq_map_user()
  -> blk_rq_map_user_iov() -> memset() is another ~4% for me.

- The kmalloc+kfree per command is roughly 9% extra slowdown.

There are other little things, but the above are the main ones. Even if
I disable fixedbufs for non-passthru, passthru is about ~24% slower
here using a single device and a single core, which is mostly the above
mentioned items.

This isn't specific to the iopoll support, that's obviously faster than
IRQ driven for this test case. This is just comparing passthru with
the regular block path for doing random 512b reads.

-- 
Jens Axboe