Re: [PATCH 3/5] io_uring: move to using private ring references

Jens Axboe <axboe@xxxxxxxxx> · Wed, 5 Jun 2024 13:39:00 -0600

On 6/5/24 1:29 PM, Jens Axboe wrote:
> On 6/5/24 1:13 PM, Pavel Begunkov wrote:
>> On 6/5/24 17:31, Pavel Begunkov wrote:
>>> On 6/5/24 16:11, Pavel Begunkov wrote:
>>>> On 6/4/24 20:01, Jens Axboe wrote:
>>>>> io_uring currently uses percpu refcounts for the ring reference. This
>>>>> works fine, but exiting a ring requires an RCU grace period to lapse
>>>>> and this slows down ring exit quite a lot.
>>>>>
>>>>> Add a basic per-cpu counter for our references instead, and use that.
>>>>
>>>> All the synchronisation heavy lifting is done by RCU, what
>>>> makes it safe to read other CPUs counters in
>>>> io_ring_ref_maybe_done()?
>>>
>>> Other options are expedited RCU (Paul saying it's an order of
>>> magnitude faster), or to switch to plain atomics since it's cached,
>>> but it's only good if submitter and waiter are the same task. Paul
>>
>> I mixed it with task refs, ctx refs should be cached well
>> for any configuration as they're bound to requests (and req
>> caches).
> 
> That's a good point, maybe even our current RCU approach is overkill
> since we do the caching pretty well. Let me run a quick test, just
> switching this to a basic atomic_t. The dead mask can just be the 31st
> bit.

Well, the exception is non-local task_work, we still grab and put a
reference on the ctx for each context while iterating.

Outside of that, the request pre-alloc takes care of the rest.

-- 
Jens Axboe