On 3/18/22 9:13 AM, Pavel Begunkov wrote: > On 3/18/22 14:54, Jens Axboe wrote: >> On 3/18/22 7:52 AM, Pavel Begunkov wrote: >>> When only one task submits requests, most of CQEs are expected to be >>> filled from that task context so we have natural serialisation. That >>> would mean that in those cases we don't need spinlocking around CQE >>> posting. One downside is that it also mean that io-wq workers can't emit >>> CQEs directly but should do it through the original task context using >>> task_works. That may hurt latency and performance and might matter much >>> to some workloads, but it's not a huge deal in general as io-wq is a >>> slow path and there is some additional merit from tw completion >>> batching. >> >> Not too worried about io-wq task_work for cq filling, it is the slower >> path after all. And I think we can get away with doing notifications as >> it's just for CQ filling. If the task is currently waiting in >> cqring_wait, then it'll get woken anyway and it will process task work. >> If it's in userspace, it doesn't need a notification. That should make >> it somewhat lighter than requiring using TIF_NOTIFY_SIGNAL for that. >> >>> The feature should be opted-in by the userspace by setting a new >>> IORING_SETUP_PRIVATE_CQ flag. It doesn't work with IOPOLL, and also for >>> now only the task that created a ring can submit requests to it. >> >> I know this is a WIP, but why do we need CQ_PRIVATE? And this needs to > > One reason is because of the io-wq -> tw punting, which is not optimal > for e.g. active users of IOSQE_ASYNC. The second is because the > fundamental requirement is that only one task should be submitting > requests. Was thinking about automating it, e.g. when we register > a second tctx we go through a slow path waiting for all current tw > to complete and then removing an internal and not userspace visible > CQ_PRIVATE flag. Was thinking something along those lines too. The alternative is setting up the ring with SETUP_SINGLE_ISSUER or something like that, having the application tell us that it is a single issuer and no submits are shared across threads. Serves the same kind of purpose as CQ_PRIVATE, but enables us to simply fail things if the task violates those constraints. Would also be a better name I believe as it might enable further optimizations in the future, like for example the mutex reduction for submits. > Also, as SQPOLL task is by definition the only one submitting SQEs, > was thinking about enabling it by default for them, but didn't do > because of the io-wq / IOSQE_ASYNC. Gotcha. >> work with registered files (and ring fd) as that is probably a bigger >> win than skipping the completion_lock if you're not shared anyway. > > It does work with fixed/registered files and registered io_uring fds. t/io_uring fails for me with registered files or rings, getting EINVAL. Might be user error, but that's simply just setting CQ_PRIVATE for setup. > In regards of "a bigger win", probably in many cases, but if you submit > a good batch at once, and completion tw batching doesn't kick in (e.g. > direct bdev read of not too high intensity), it might save > N spinlock/unlock when registered ring fd would kill only one pair of > fdget/fdput. Definitely, various cases where one would be a bigger win than the other, agree on that. But let's just ensure that both work together :-) -- Jens Axboe