Re: [PATCH 0/4] iopoll support for io_uring/nvme passthrough

Keith Busch <kbusch@xxxxxxxxxx> · Fri, 5 Aug 2022 12:11:25 -0600

On Fri, Aug 05, 2022 at 11:18:38AM -0600, Jens Axboe wrote:
> On 8/5/22 11:04 AM, Jens Axboe wrote:
> > On 8/5/22 9:42 AM, Kanchan Joshi wrote:
> >> Hi,
> >>
> >> Series enables async polling on io_uring command, and nvme passthrough
> >> (for io-commands) is wired up to leverage that.
> >>
> >> 512b randread performance (KIOP) below:
> >>
> >> QD_batch    block    passthru    passthru-poll   block-poll
> >> 1_1          80        81          158            157
> >> 8_2         406       470          680            700
> >> 16_4        620       656          931            920
> >> 128_32      879       1056        1120            1132
> > 
> > Curious on why passthru is slower than block-poll? Are we missing
> > something here?
> 
> I took a quick peek, running it here. List of items making it slower:
> 
> - No fixedbufs support for passthru, each each request will go through
>   get_user_pages() and put_pages() on completion. This is about a 10%
>   change for me, by itself.

Enabling fixed buffer support through here looks like it will take a little bit
of work. The driver needs an opcode or flag to tell it the user address is a
fixed buffer, and io_uring needs to export its registered buffer for a driver
like nvme to get to.

> - nvme_uring_cmd_io() -> nvme_alloc_user_request() -> blk_rq_map_user()
>   -> blk_rq_map_user_iov() -> memset() is another ~4% for me.

Where's the memset() coming from? That should only happen if we need to bounce,
right? This type of request shouldn't need that unless you're using odd user
address alignment.