On 11/21/24 14:34, Jens Axboe wrote:
On 11/21/24 7:29 AM, Pavel Begunkov wrote:
On 11/21/24 00:52, David Wei wrote:
On 2024-11-20 15:56, Pavel Begunkov wrote:
On 11/20/24 22:14, David Wei wrote:
...
There is a Memcache-like workload that has load shedding based on the
time spent doing work. With epoll, the work of reading sockets and
processing a request is done by user, which can decide after some amount
of time to drop the remaining work if it takes too long. With io_uring,
the work of reading sockets is done eagerly inside of task work. If
there is a burst of work, then so much time is spent in task work
reading from sockets that, by the time control returns to user the
timeout has already elapsed.
Interesting, it also sounds like instead of an arbitrary 20 we
might want the user to feed it to us. Might be easier to do it
with the bpf toy not to carve another argument.
David and I did discuss that, and I was not in favor of having an extra
argument. We really just need some kind of limit to prevent it
over-running. Arguably that should always be min_events, which we
already have, but that kind of runs afoul of applications just doing
io_uring_wait_cqe() and hence asking for 1. That's why the hand wavy
number exists, which is really no different than other hand wavy numbers
we have to limit running of "something" - eg other kinds of retries.
Adding another argument to this just again doubles wait logic complexity
in terms of the API. If it's needed down the line for whatever reason,
then yeah we can certainly do it, probably via the wait regions. But
adding it to the generic wait path would be a mistake imho.
Right, I don't like the idea of a wait argument either, messy and
too advanced of a tuning for it. BPF would be fine as it could be
made as a hint and easily removed if needed. It could also be a per
ring REGISTER based hint, a bit worse but with the same deprecation
argument. Obviously would need a good user / use case first.
I also strongly suggest this is the last we'll ever hear of this, and
for that reason alone I don't think it's worth any kind of extra
arguments or added complexity.
--
Pavel Begunkov