On 12/21/19 9:48 AM, Pavel Begunkov wrote: > On 21/12/2019 19:38, Jens Axboe wrote: >> On 12/21/19 9:20 AM, Pavel Begunkov wrote: >>> On 21/12/2019 19:15, Pavel Begunkov wrote: >>>> Double account ctx->refs keeping number of taken refs in ctx. As >>>> io_uring gets per-request ctx->refs during submission, while holding >>>> ctx->uring_lock, this allows in most of the time to bypass >>>> percpu_ref_get*() and its overhead. >>> >>> Jens, could you please benchmark with this one? Especially for offloaded QD1 >>> case. I haven't got any difference for nops test and don't have a decent SSD >>> at hands to test it myself. We could drop it, if there is no benefit. >>> >>> This rewrites that @extra_refs from the second one, so I left it for now. >> >> Sure, let me run a peak test, qd1 test, qd1+sqpoll test on >> for-5.6/io_uring, same branch with 1-2, and same branch with 1-3. That >> should give us a good comparison. One core used for all, and we're going >> to be core speed bound for the performance in all cases on this setup. >> So it'll be a good comparison. >> > Great, thanks! For some reason, not seeing much of a change between for-5.6/io_uring and 1+2 and 1+2+3, it's about the same and results seem very stable. For reference, top of profile with 1-3 applied looks like this: + 3.92% io_uring [kernel.vmlinux] [k] blkdev_direct_IO + 3.87% io_uring [kernel.vmlinux] [k] blk_mq_get_request + 3.43% io_uring [kernel.vmlinux] [k] io_iopoll_getevents + 3.03% io_uring [kernel.vmlinux] [k] __slab_free + 2.87% io_uring io_uring [.] submitter_fn + 2.79% io_uring [kernel.vmlinux] [k] io_submit_sqes + 2.75% io_uring [kernel.vmlinux] [k] bio_alloc_bioset + 2.70% io_uring [nvme_core] [k] nvme_setup_cmd + 2.59% io_uring [kernel.vmlinux] [k] blk_mq_make_request + 2.46% io_uring [kernel.vmlinux] [k] io_prep_rw + 2.32% io_uring [kernel.vmlinux] [k] io_read + 2.25% io_uring [kernel.vmlinux] [k] blk_mq_free_request + 2.19% io_uring [kernel.vmlinux] [k] io_put_req + 2.06% io_uring [kernel.vmlinux] [k] kmem_cache_alloc + 2.01% io_uring [kernel.vmlinux] [k] generic_make_request_checks + 1.90% io_uring [kernel.vmlinux] [k] __sbitmap_get_word + 1.86% io_uring [kernel.vmlinux] [k] sbitmap_queue_clear + 1.85% io_uring [kernel.vmlinux] [k] io_issue_sqe -- Jens Axboe