Re: [PATCH RFC} io_uring: io_kiocb alloc cache

Jens Axboe <axboe@xxxxxxxxx> · Thu, 14 May 2020 11:41:04 -0600

On 5/14/20 11:01 AM, Jens Axboe wrote:
> On 5/14/20 10:25 AM, Pavel Begunkov wrote:
>> On 14/05/2020 19:18, Pavel Begunkov wrote:
>>> On 14/05/2020 18:53, Jens Axboe wrote:
>>>> On 5/14/20 9:37 AM, Pavel Begunkov wrote:
>>>> Hmm yes good point, it should work pretty easily, barring the use cases
>>>> that do IRQ complete. But that was also a special case with the other
>>>> cache.
>>>>
>>>>> BTW, there will be a lot of problems to make either work properly with
>>>>> IORING_FEAT_SUBMIT_STABLE.
>>>>
>>>> How so? Once the request is setup, any state should be retained there.
>>>
>>> If a late alloc fails (e.g. in __io_queue_sqe()), you'd need to file a CQE with
>>> an error. If there is no place in CQ, to postpone the completion it'd require an
>>> allocated req. Of course it can be dropped, but I'd prefer to have strict
>>> guarantees.
>>
>> I know how to do it right for my version.
>> Is it still just for fun thing, or you think it'll be useful for real I/O?
> 
> We're definitely spending quite a bit of time on alloc+free and the atomics
> for the refcount. Considering we're core limited on some workloads, any
> cycles we can get back will ultimately increase the performance. So yeah,
> definitely worth exploring and finding something that works.

BTW, one oddity of the NOP microbenchmark that makes it less than useful
as a general test case is the fact that any request will complete immediately.
The default settings of that test is submitting batches of 16, which means
that we'll bulk allocate 16 io_kiocbs when we enter. But we only ever really
need one, as by the time we get to request #2, we've already freed the first
request (and so forth). 

I kind of like the idea of recycling requests. If the completion does happen
inline, then we're cache hot for the next issue. Right now we go through
a new request every time, regardless of whether or not the previous one just
got freed.

-- 
Jens Axboe