On 5/14/20 9:37 AM, Pavel Begunkov wrote: > > > On 14/05/2020 18:15, Jens Axboe wrote: >> On 5/14/20 8:53 AM, Pavel Begunkov wrote: >>> On 14/05/2020 17:33, Jens Axboe wrote: >>>> On 5/14/20 8:22 AM, Jens Axboe wrote: >>>>>> I still use my previous io_uring_nop_stress tool to evaluate the improvement >>>>>> in a physical machine. Memory 250GB and cpu is "Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz". >>>>>> Before this patch: >>>>>> $sudo taskset -c 60 ./io_uring_nop_stress -r 300 >>>>>> total ios: 1608773840 >>>>>> IOPS: 5362579 >>>>>> >>>>>> With this patch: >>>>>> sudo taskset -c 60 ./io_uring_nop_stress -r 300 >>>>>> total ios: 1676910736 >>>>>> IOPS: 5589702 >>>>>> About 4.2% improvement. >>>>> >>>>> That's not bad. Can you try the patch from Pekka as well, just to see if >>>>> that helps for you? >>>>> >>>>> I also had another idea... We basically have two types of request life >>>>> times: >>>>> >>>>> 1) io_kiocb can get queued up internally >>>>> 2) io_kiocb completes inline >>>>> >>>>> For the latter, it's totally feasible to just have the io_kiocb on >>>>> stack. The downside is if we need to go the slower path, then we need to >>>>> alloc an io_kiocb then and copy it. But maybe that's OK... I'll play >>>>> with it. >>> >>> Does it differ from having one pre-allocated req? Like fallback_req, >>> but without atomics and returned only under uring_mutex (i.e. in >>> __io_queue_sqe()). Putting aside its usefulness, at least it will have >>> a chance to work with reads/writes. >> >> But then you need atomics. I actually think the bigger win here is not >> having to use atomic refcounts for this particular part, since we know >> the request can't get shared. > > Don't think we need, see: > > struct ctx { > /* protected by uring_mtx */ > struct req *cache_req; > } > > __io_queue_sqe() > { > ret = issue_inline(req); > if (completed(ret)) { > // don't need req anymore, return it > ctx->cache_req = req; > } else if (need_async) { > // still under uring_mtx, just replenish the cache > // alloc()+memcpy() here for on-stack > ctx->cache_req = alloc_req(); > punt(req); > } > > // restored it in any case > assert(ctx->cache_req != NULL); > } > > submit_sqes() __holds(uring_mtx) > { > while (...) { > // already holding the mutex, no need for sync here > // also, there is always a req there > req = ctx->cache_req; > ctx->cache_req = NULL; > ... > } > } Hmm yes good point, it should work pretty easily, barring the use cases that do IRQ complete. But that was also a special case with the other cache. > BTW, there will be a lot of problems to make either work properly with > IORING_FEAT_SUBMIT_STABLE. How so? Once the request is setup, any state should be retained there. -- Jens Axboe