On 8/5/22 12:11 PM, Keith Busch wrote: > On Fri, Aug 05, 2022 at 11:18:38AM -0600, Jens Axboe wrote: >> On 8/5/22 11:04 AM, Jens Axboe wrote: >>> On 8/5/22 9:42 AM, Kanchan Joshi wrote: >>>> Hi, >>>> >>>> Series enables async polling on io_uring command, and nvme passthrough >>>> (for io-commands) is wired up to leverage that. >>>> >>>> 512b randread performance (KIOP) below: >>>> >>>> QD_batch block passthru passthru-poll block-poll >>>> 1_1 80 81 158 157 >>>> 8_2 406 470 680 700 >>>> 16_4 620 656 931 920 >>>> 128_32 879 1056 1120 1132 >>> >>> Curious on why passthru is slower than block-poll? Are we missing >>> something here? >> >> I took a quick peek, running it here. List of items making it slower: >> >> - No fixedbufs support for passthru, each each request will go through >> get_user_pages() and put_pages() on completion. This is about a 10% >> change for me, by itself. > > Enabling fixed buffer support through here looks like it will take a > little bit of work. The driver needs an opcode or flag to tell it the > user address is a fixed buffer, and io_uring needs to export its > registered buffer for a driver like nvme to get to. Yeah, it's not a straight forward thing. But if this will be used with recycled buffers, then it'll definitely be worthwhile to look into. >> - nvme_uring_cmd_io() -> nvme_alloc_user_request() -> blk_rq_map_user() >> -> blk_rq_map_user_iov() -> memset() is another ~4% for me. > > Where's the memset() coming from? That should only happen if we need > to bounce, right? This type of request shouldn't need that unless > you're using odd user address alignment. Not sure, need to double check! Hacking up a patch to get rid of the frivolous alloc+free, we'll see how it stands after that and I'll find it. -- Jens Axboe