On 11/25/21 09:35, Hao Xu wrote:
在 2021/11/11 上午12:42, Pavel Begunkov 写道:
On 11/10/21 16:14, Jens Axboe wrote:
On 11/10/21 8:49 AM, Pavel Begunkov wrote:
It's expensive enough to post an CQE, and there are other
reasons to want to ignore them, e.g. for link handling and
it may just be more convenient for the userspace.
Try to cover most of the use cases with one flag. The overhead
is one "if (cqe->flags & IOSQE_CQE_SKIP_SUCCESS)" check per
requests and a bit bloated req_set_fail(), should be bearable.
I like the idea, one thing I'm struggling with is I think a normal use
case of this would be fast IO where we still need to know if a
completion event has happened, we just don't need to know the details of
it since we already know what those details would be if it ends up in
success.
How about having a skip counter? That would supposedly also allow drain
to work, and it could be mapped with the other cq parts to allow the app
to see it as well.
It doesn't go through expensive io_cqring_ev_posted(), so the userspace
can't really wait on it. It can do some linking tricks to alleviate that,
but I don't see any new capabilities from the current approach.
Also the locking is a problem, I was thinking about it, mainly hoping
that I can adjust cq_extra and leave draining, but it didn't appear
great to me. AFAIK, it's either an atomic, beating the purpose of the
thing.
For drain requests, we just need to adjust cq_extra:
if (!skip) fill_cqe;
else cq_extra--;
cq_extra is already protected by completion_lock
Yes, and we don't take the lock in __io_submit_flush_completions()
when not posting.
Another option is to split it in two, one counter is kept under
->uring_lock and another under ->completion_lock. But it'll be messy,
shifting flushing part of draining to a work-queue for mutex locking,
adding yet another bunch of counters that hard to maintain and so.
And __io_submit_flush_completions() would also need to go through
the request list one extra time to do the accounting, wouldn't
want to grow massively inlined io_req_complete_state().
--
Pavel Begunkov