On Fri, Aug 05, 2022 at 11:18:38AM -0600, Jens Axboe wrote: > On 8/5/22 11:04 AM, Jens Axboe wrote: > > On 8/5/22 9:42 AM, Kanchan Joshi wrote: > >> Hi, > >> > >> Series enables async polling on io_uring command, and nvme passthrough > >> (for io-commands) is wired up to leverage that. > >> > >> 512b randread performance (KIOP) below: > >> > >> QD_batch block passthru passthru-poll block-poll > >> 1_1 80 81 158 157 > >> 8_2 406 470 680 700 > >> 16_4 620 656 931 920 > >> 128_32 879 1056 1120 1132 > > > > Curious on why passthru is slower than block-poll? Are we missing > > something here? > > I took a quick peek, running it here. List of items making it slower: > > - No fixedbufs support for passthru, each each request will go through > get_user_pages() and put_pages() on completion. This is about a 10% > change for me, by itself. Enabling fixed buffer support through here looks like it will take a little bit of work. The driver needs an opcode or flag to tell it the user address is a fixed buffer, and io_uring needs to export its registered buffer for a driver like nvme to get to. > - nvme_uring_cmd_io() -> nvme_alloc_user_request() -> blk_rq_map_user() > -> blk_rq_map_user_iov() -> memset() is another ~4% for me. Where's the memset() coming from? That should only happen if we need to bounce, right? This type of request shouldn't need that unless you're using odd user address alignment.