On 7/30/24 21:05, Olivier Langlois wrote:
if you are interested into all the details, they are all here: https://github.com/axboe/liburing/issues/1190 it seems like I like to write a lot when I am investigating a problem. Pavel has been a great help in assisting me understanding what was happening. Next, I came to question where the integration of RCU came from and I have found this: https://lore.kernel.org/all/89ef84bf-48c2-594c-cc9c-f796adcab5e8@xxxxxxxxx/ I guess that in some use-case being able to dynamically manage hundreds of NAPI devices automatically that can suddenly all be swepted over during a device reconfiguration is something desirable to have for some... but in my case, this is an excessively a high price to pay for a flexibility that I do not need at all.
Removing an entry or two once every minute is definitely not going to take 50% CPU, RCU machinery is running in background regardless of whether io_uring uses it or not, and it's pretty cheap considering ammortisation. If anything it more sounds from your explanation like the scheduler makes a wrong decision and schedules out the sqpoll thread even though it could continue to run, but that's need a confirmation. Does the CPU your SQPOLL is pinned to stays 100% utilised?
I have a single NAPI device. Once I know what it is, it will pratically remain immutable until termination. For that reason, I am thinking that offering some sort of polymorphic NAPI device tracking strategy customization would be desirable. The current one, the RCU one, I would call it the dynamic_napi_tracking (rcu could be peppered in the name somewhere so people know what the strategy is up to) where as the new one that I am imagining would be called static_napi_tracking. NAPI devices would be added/removed by the user manually through an extended registration function. for the sake of conveniance, a clear_list operation could even be offered. The benefits of this new static tracking strategy would be numerous: - this removes the need to invoke the heavy duty RCU cavalry - no need to scan the list to remove stall devices - no need to search the list at each SQE submission to update the device timeout value So is this a good idea in your opinion?
I believe that's a good thing, I've been prototyping a similar if not the same approach just today, i.e. user [un]registers napi instance by id you can get with SO_INCOMING_NAPI_ID. -- Pavel Begunkov