> 2020年2月14日 下午8:52,Pavel Begunkov <asml.silence@xxxxxxxxx> 写道: > > On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote: >> >>> 2020年2月14日 下午6:34,Pavel Begunkov <asml.silence@xxxxxxxxx> 写道: >>> >>> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote: >>>> To implement io_uring_wait_cqe_timeout, we introduce a magic number >>>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we >>>> must make sure that users should never set sqe->user_data to >>>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to >>>> filter out TIMEOUT cqes. >>>> >>>> Former discussion: https://github.com/axboe/liburing/issues/53 >>>> >>>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE >>>> to solve this problem. >>>> >>>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe >>>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel >>>> side. >>>> >>>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. >>>> >>>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually >>>> don’t care the result of `POLL_ADD` is ( since it will always be >>>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots >>>> of cq size. >>>> >>>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE >>>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged >>>> with IOSQE_IGNORE_CQE. >>>> >>>> Thoughts? >>>> >>> >>> I like the idea! And that's one of my TODOs for the eBPF plans. >>> Let me list my use cases, so we can think how to extend it a bit. >>> >>> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and >>> resubmit the rest. It's quite inconvenient. We may want to have CQE only >>> for not cancelled requests. >>> >>> 2. When chain succeeded, you in the most cases already know the result >>> of all intermediate CQEs, but you still need to reap and match them. >>> I'd prefer to have only 1 CQE per link, that is either for the first >>> failed or for the last request in the chain. >>> >>> These 2 may shed much processing overhead from the userspace. >> >> I couldn't agree more! >> >> Another problem is that io_uring_enter will be awaked for completion of >> every operation in a link, which results in unnecessary context switch. >> When awaked, users have nothing to do but issue another io_uring_enter >> syscall to wait for completion of the entire link chain. > > Good point. Sounds like I have one more thing to do :) > Would the behaviour as in the (2) cover all your needs? (2) should cover most cases for me. For cases it couldn’t cover ( if any ), I can still use normal sqes. > > There is a nuisance with linked timeouts, but I think it's reasonable > for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ > >>> >>> 3. If we generate requests by eBPF even the notion of per-request event >>> may broke. >>> - eBPF creating new requests would also need to specify user-data, and >>> this may be problematic from the user perspective. >>> - may want to not generate CQEs automatically, but let eBPF do it. >>> >>> -- >>> Pavel Begunkov >> > > -- > Pavel Begunkov