Re: [PATCH RFC} io_uring: io_kiocb alloc cache

Jens Axboe <axboe@xxxxxxxxx> · Thu, 14 May 2020 09:53:21 -0600

On 5/14/20 9:37 AM, Pavel Begunkov wrote:
> 
> 
> On 14/05/2020 18:15, Jens Axboe wrote:
>> On 5/14/20 8:53 AM, Pavel Begunkov wrote:
>>> On 14/05/2020 17:33, Jens Axboe wrote:
>>>> On 5/14/20 8:22 AM, Jens Axboe wrote:
>>>>>> I still use my previous io_uring_nop_stress tool to evaluate the improvement
>>>>>> in a physical machine. Memory 250GB and cpu is "Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz".
>>>>>> Before this patch:
>>>>>> $sudo taskset -c 60 ./io_uring_nop_stress -r 300
>>>>>> total ios: 1608773840
>>>>>> IOPS:      5362579
>>>>>>
>>>>>> With this patch:
>>>>>> sudo taskset -c 60 ./io_uring_nop_stress -r 300
>>>>>> total ios: 1676910736
>>>>>> IOPS:      5589702
>>>>>> About 4.2% improvement.
>>>>>
>>>>> That's not bad. Can you try the patch from Pekka as well, just to see if
>>>>> that helps for you?
>>>>>
>>>>> I also had another idea... We basically have two types of request life
>>>>> times:
>>>>>
>>>>> 1) io_kiocb can get queued up internally
>>>>> 2) io_kiocb completes inline
>>>>>
>>>>> For the latter, it's totally feasible to just have the io_kiocb on
>>>>> stack. The downside is if we need to go the slower path, then we need to
>>>>> alloc an io_kiocb then and copy it. But maybe that's OK... I'll play
>>>>> with it.
>>>
>>> Does it differ from having one pre-allocated req? Like fallback_req,
>>> but without atomics and returned only under uring_mutex (i.e. in
>>> __io_queue_sqe()). Putting aside its usefulness, at least it will have
>>> a chance to work with reads/writes.
>>
>> But then you need atomics. I actually think the bigger win here is not
>> having to use atomic refcounts for this particular part, since we know
>> the request can't get shared.
> 
> Don't think we need, see:
> 
> struct ctx {
> 	/* protected by uring_mtx */
> 	struct req *cache_req;
> }
> 
> __io_queue_sqe()
> {
> 	ret = issue_inline(req);
> 	if (completed(ret)) {
> 		// don't need req anymore, return it
> 		ctx->cache_req = req;
> 	} else if (need_async) {
> 		// still under uring_mtx, just replenish the cache
> 		// alloc()+memcpy() here for on-stack
> 		ctx->cache_req = alloc_req();
> 		punt(req);
> 	}
> 
> 	// restored it in any case
> 	assert(ctx->cache_req != NULL);
> }
> 
> submit_sqes() __holds(uring_mtx)
> {
> 	while (...) {
> 		// already holding the mutex, no need for sync here
> 		// also, there is always a req there
> 		req = ctx->cache_req;
> 		ctx->cache_req = NULL;
> 		...
> 	}
> }

Hmm yes good point, it should work pretty easily, barring the use cases
that do IRQ complete. But that was also a special case with the other
cache.

> BTW, there will be a lot of problems to make either work properly with
> IORING_FEAT_SUBMIT_STABLE.

How so? Once the request is setup, any state should be retained there.

-- 
Jens Axboe