On 5/14/20 11:01 AM, Jens Axboe wrote: > On 5/14/20 10:25 AM, Pavel Begunkov wrote: >> On 14/05/2020 19:18, Pavel Begunkov wrote: >>> On 14/05/2020 18:53, Jens Axboe wrote: >>>> On 5/14/20 9:37 AM, Pavel Begunkov wrote: >>>> Hmm yes good point, it should work pretty easily, barring the use cases >>>> that do IRQ complete. But that was also a special case with the other >>>> cache. >>>> >>>>> BTW, there will be a lot of problems to make either work properly with >>>>> IORING_FEAT_SUBMIT_STABLE. >>>> >>>> How so? Once the request is setup, any state should be retained there. >>> >>> If a late alloc fails (e.g. in __io_queue_sqe()), you'd need to file a CQE with >>> an error. If there is no place in CQ, to postpone the completion it'd require an >>> allocated req. Of course it can be dropped, but I'd prefer to have strict >>> guarantees. >> >> I know how to do it right for my version. >> Is it still just for fun thing, or you think it'll be useful for real I/O? > > We're definitely spending quite a bit of time on alloc+free and the atomics > for the refcount. Considering we're core limited on some workloads, any > cycles we can get back will ultimately increase the performance. So yeah, > definitely worth exploring and finding something that works. BTW, one oddity of the NOP microbenchmark that makes it less than useful as a general test case is the fact that any request will complete immediately. The default settings of that test is submitting batches of 16, which means that we'll bulk allocate 16 io_kiocbs when we enter. But we only ever really need one, as by the time we get to request #2, we've already freed the first request (and so forth). I kind of like the idea of recycling requests. If the completion does happen inline, then we're cache hot for the next issue. Right now we go through a new request every time, regardless of whether or not the previous one just got freed. -- Jens Axboe