Re: io_uring NAPI busy poll RCU is causing 50 context switches/second to my sqpoll thread

Pavel Begunkov <asml.silence@xxxxxxxxx> · Tue, 30 Jul 2024 21:25:58 +0100

On 7/30/24 21:05, Olivier Langlois wrote:
if you are interested into all the details,

they are all here:
https://github.com/axboe/liburing/issues/1190

it seems like I like to write a lot when I am investigating a problem.
Pavel has been a great help in assisting me understanding what was
happening.

Next, I came to question where the integration of RCU came from and I
have found this:
https://lore.kernel.org/all/89ef84bf-48c2-594c-cc9c-f796adcab5e8@xxxxxxxxx/

I guess that in some use-case being able to dynamically manage hundreds
of NAPI devices automatically that can suddenly all be swepted over
during a device reconfiguration is something desirable to have for
some...

but in my case, this is an excessively a high price to pay for a
flexibility that I do not need at all.

Removing an entry or two once every minute is definitely not
going to take 50% CPU, RCU machinery is running in background
regardless of whether io_uring uses it or not, and it's pretty
cheap considering ammortisation.

If anything it more sounds from your explanation like the
scheduler makes a wrong decision and schedules out the sqpoll
thread even though it could continue to run, but that's need
a confirmation. Does the CPU your SQPOLL is pinned to stays
100% utilised?

I have a single NAPI device. Once I know what it is, it will pratically
remain immutable until termination.

For that reason, I am thinking that offering some sort of polymorphic
NAPI device tracking strategy customization would be desirable.

The current one, the RCU one, I would call it the

dynamic_napi_tracking (rcu could be peppered in the name somewhere so
people know what the strategy is up to)

where as the new one that I am imagining would be called

static_napi_tracking.

NAPI devices would be added/removed by the user manually through an
extended registration function.

for the sake of conveniance, a clear_list operation could even be
offered.

The benefits of this new static tracking strategy would be numerous:
- this removes the need to invoke the heavy duty RCU cavalry
- no need to scan the list to remove stall devices
- no need to search the list at each SQE submission to update the
device timeout value

So is this a good idea in your opinion?

I believe that's a good thing, I've been prototyping a similar
if not the same approach just today, i.e. user [un]registers
napi instance by id you can get with SO_INCOMING_NAPI_ID.

--
Pavel Begunkov