Re: [PATCH next v1 2/2] io_uring: limit local tw done

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/21/24 7:58 AM, Pavel Begunkov wrote:
> On 11/21/24 14:34, Jens Axboe wrote:
>> On 11/21/24 7:29 AM, Pavel Begunkov wrote:
>>> On 11/21/24 00:52, David Wei wrote:
>>>> On 2024-11-20 15:56, Pavel Begunkov wrote:
>>>>> On 11/20/24 22:14, David Wei wrote:
> ...
>>>> There is a Memcache-like workload that has load shedding based on the
>>>> time spent doing work. With epoll, the work of reading sockets and
>>>> processing a request is done by user, which can decide after some amount
>>>> of time to drop the remaining work if it takes too long. With io_uring,
>>>> the work of reading sockets is done eagerly inside of task work. If
>>>> there is a burst of work, then so much time is spent in task work
>>>> reading from sockets that, by the time control returns to user the
>>>> timeout has already elapsed.
>>>
>>> Interesting, it also sounds like instead of an arbitrary 20 we
>>> might want the user to feed it to us. Might be easier to do it
>>> with the bpf toy not to carve another argument.
>>
>> David and I did discuss that, and I was not in favor of having an extra
>> argument. We really just need some kind of limit to prevent it
>> over-running. Arguably that should always be min_events, which we
>> already have, but that kind of runs afoul of applications just doing
>> io_uring_wait_cqe() and hence asking for 1. That's why the hand wavy
>> number exists, which is really no different than other hand wavy numbers
>> we have to limit running of "something" - eg other kinds of retries.
>>
>> Adding another argument to this just again doubles wait logic complexity
>> in terms of the API. If it's needed down the line for whatever reason,
>> then yeah we can certainly do it, probably via the wait regions. But
>> adding it to the generic wait path would be a mistake imho.
> 
> Right, I don't like the idea of a wait argument either, messy and
> too advanced of a tuning for it. BPF would be fine as it could be
> made as a hint and easily removed if needed. It could also be a per
> ring REGISTER based hint, a bit worse but with the same deprecation
> argument. Obviously would need a good user / use case first.

Sure I agree on that, if you already have BPF integration for other
things, then yeah you could add tighter control for it. Even with that,
I doubt it'd be useful or meaningful really. Current case seems like a
worst case kind of thing, recv task_work is arguably one of the more
expensive things that can be done out of task_work.

We can revisit down the line if it ever becomes needed.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux