Re: [PATCH 3/5] io_uring: move to using private ring references

Jens Axboe <axboe@xxxxxxxxx> · Wed, 5 Jun 2024 13:29:42 -0600

On 6/5/24 1:13 PM, Pavel Begunkov wrote:
> On 6/5/24 17:31, Pavel Begunkov wrote:
>> On 6/5/24 16:11, Pavel Begunkov wrote:
>>> On 6/4/24 20:01, Jens Axboe wrote:
>>>> io_uring currently uses percpu refcounts for the ring reference. This
>>>> works fine, but exiting a ring requires an RCU grace period to lapse
>>>> and this slows down ring exit quite a lot.
>>>>
>>>> Add a basic per-cpu counter for our references instead, and use that.
>>>
>>> All the synchronisation heavy lifting is done by RCU, what
>>> makes it safe to read other CPUs counters in
>>> io_ring_ref_maybe_done()?
>>
>> Other options are expedited RCU (Paul saying it's an order of
>> magnitude faster), or to switch to plain atomics since it's cached,
>> but it's only good if submitter and waiter are the same task. Paul
> 
> I mixed it with task refs, ctx refs should be cached well
> for any configuration as they're bound to requests (and req
> caches).

That's a good point, maybe even our current RCU approach is overkill
since we do the caching pretty well. Let me run a quick test, just
switching this to a basic atomic_t. The dead mask can just be the 31st
bit.

-- 
Jens Axboe