Re: [PATCH next v1 2/2] io_uring: limit local tw done

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 21 Nov 2024 14:25:17 +0000

On 11/21/24 01:12, Jens Axboe wrote:
On 11/20/24 4:56 PM, Pavel Begunkov wrote:
On 11/20/24 22:14, David Wei wrote:
...
One thing that is not so nice is that now we have this handling and
checks in the hot path, and __io_run_local_work_loop() most likely
gets uninlined.

I don't think that really matters, it's pretty light. The main overhead
in this function is not the call, it's reordering requests and touching
cachelines of the requests.

I think it's pretty light as-is and actually looks pretty good. It's

It could be light, but the question is importance / frequency of
the new path. If it only happens rarely but affects a high 9,
then it'd more sense to optimise it from the common path.

also similar to how sqpoll bites over longer task_work lines, and
arguably a mistake that we allow huge depths of this when we can avoid
it with deferred task_work.

I wonder, can we just requeue it via task_work again? We can even
add a variant efficiently adding a list instead of a single entry,
i.e. local_task_work_add(head, tail, ...);

I think that can only work if we change work_llist to be a regular list
with regular locking. Otherwise it's a bit of a mess with the list being

Dylan once measured the overhead of locks vs atomics in this
path for some artificial case, we can pull the numbers up.

reordered, and then you're spending extra cycles on potentially
reordering all the entries again.

That sucks, I agree, but then it's same question of how often
it happens.

--
Pavel Begunkov