On 8/19/21 4:53 PM, Jens Axboe wrote: > On 8/18/21 5:42 AM, Pavel Begunkov wrote: >> In essence, it's about two features. The first one is implemented by >> 1-2 and saves ->uring_lock lock/unlock in a single call of >> tctx_task_work(). Should be useful for links, apolls and BPF requests >> at some moment. >> >> The second feature (3/3) is batching freeing and completing of >> IRQ-based read/write requests. >> >> Haven't got numbers yet, but just throwing it for public discussion. > > I ran some numbers and it looks good to me, it's a nice boost for the > IRQ completions. It's funny how the initial move to task_work for IRQ > completions took a small hit, but there's so many optimizations that it > unlocks that it's already better than before. > > I'd like to apply 1/3 for now, but it depends on both master and > for-5.15/io_uring. Hence I think it'd be better to defer that one until > after the initial batch has gone in. > > For the batched locking, the principle is sound and measures out to be a > nice win. But I have a hard time getting over the passed lock state, I > do wonder if there's a cleaner way to accomplish this... The initial idea was to have a request flag telling whether a task_work function may need a lock, but setting/clearing it would be more subtle. Then there is io_poll_task_func -> io_req_task_submit -> lock, and reads/writes based using trylock, because otherwise I'd rather be afraid of it hurting latency. This version looks good enough, apart from conditional locking always being a pain. We can hide bool into a struct, and with a bunch of helpers leave no visibility into it. Though, I don't think it helps anything. -- Pavel Begunkov