On 11/21/24 00:52, David Wei wrote:
On 2024-11-20 15:56, Pavel Begunkov wrote:
On 11/20/24 22:14, David Wei wrote:
...
One thing that is not so nice is that now we have this handling and
checks in the hot path, and __io_run_local_work_loop() most likely
gets uninlined.
I wonder, can we just requeue it via task_work again? We can even
add a variant efficiently adding a list instead of a single entry,
i.e. local_task_work_add(head, tail, ...);
That was an early idea, but it means re-reversing the list and then
atomically adding each node back to work_llist concurrently with e.g.
io_req_local_work_add().
Using a separate retry_llist means we don't need to concurrently add to
either retry_llist or work_llist.
I'm also curious what's the use case you've got that is hitting
the problem?
There is a Memcache-like workload that has load shedding based on the
time spent doing work. With epoll, the work of reading sockets and
processing a request is done by user, which can decide after some amount
of time to drop the remaining work if it takes too long. With io_uring,
the work of reading sockets is done eagerly inside of task work. If
there is a burst of work, then so much time is spent in task work
reading from sockets that, by the time control returns to user the
timeout has already elapsed.
Interesting, it also sounds like instead of an arbitrary 20 we
might want the user to feed it to us. Might be easier to do it
with the bpf toy not to carve another argument.
--
Pavel Begunkov