On 8/3/20 11:04 PM, Jiufei Xue wrote: > > > On 2020/8/4 下午12:50, Jens Axboe wrote: >> On 8/3/20 7:29 PM, Jiufei Xue wrote: >>> >>> Hi Jens, >>> On 2020/8/4 上午12:41, Jens Axboe wrote: >>>> On 8/2/20 9:16 PM, Jiufei Xue wrote: >>>>> Hi Jens, >>>>> >>>>> On 2020/7/31 上午11:57, Jens Axboe wrote: >>>>>> Then why not just make the sqe-less timeout path flush existing requests, >>>>>> if it needs to? Seems a lot simpler than adding odd x2 variants, which >>>>>> won't really be clear. >>>>>> >>>>> Flushing the requests will access and modify the head of submit queue, that >>>>> may race with the submit thread. I think the reap thread should not touch >>>>> the submit queue when IORING_FEAT_GETEVENTS_TIMEOUT is supported. >>>> >>>> Ahhh, that's the clue I was missing, yes that's a good point! >>>> >>>>>> Chances are, if it's called with sq entries pending, the caller likely >>>>>> wants those submitted. Either the caller was aware and relying on that >>>>>> behavior, or the caller is simply buggy and has a case where it doesn't >>>>>> submit IO before waiting for completions. >>>>>> >>>>> >>>>> That is not true when the SQ/CQ handling are split in two different threads. >>>>> The reaping thread is not aware of the submit queue. It should only wait for >>>>> completion of the requests, such as below: >>>>> >>>>> submitting_thread: reaping_thread: >>>>> >>>>> io_uring_get_sqe() >>>>> io_uring_prep_nop() >>>>> io_uring_wait_cqe_timeout2() >>>>> io_uring_submit() >>>>> woken if requests are completed or timeout >>>>> >>>>> >>>>> And if the SQ/CQ handling are in the same thread, applications should use the >>>>> old API if they do not want to submit the request themselves. >>>>> >>>>> io_uring_get_sqe >>>>> io_uring_prep_nop >>>>> io_uring_wait_cqe_timeout >>>> >>>> Thanks, yes it's all clear to me now. I do wonder if we can't come up with >>>> something better than postfixing the functions with a 2, that seems kind of >>>> ugly and doesn't really convey to anyone what the difference is. >>>> >>>> Any suggestions for better naming? >>>> >>> how about io_uring_wait_cqe_timeout_nolock()? That means applications can use >>> the new APIs without synchronization. >> >> But even applications that don't share the ring across submit/complete >> threads will want to use the new interface, if supported by the kernel. >> Yes, if they share, they must use it - but even if they don't, it's >> likely going to be a more logical interface for them. >> >> So I don't think that _nolock() really conveys that very well, but at >> the same time I don't have any great suggestions. >> >> io_uring_wait_cqe_timeout_direct()? Or we could go simpler and just call >> it io_uring_wait_cqe_timeout_r(), which is a familiar theme from libc >> that is applied to thread safe implementations. >> >> I'll ponder this a bit... >> > > As suggested by Stefan, applications can pass a flag, say > IORING_SETUP_GETEVENTS_TIMEOUT to initialize the ring to indicate they > want to use the new feature. Function io_uring_wait_cqes() need to > submit the timeout sqe neither the kernel is not supported nor > applications do not want to use the new feature. We should not add a private setup flag for this, as the kernel doesn't really care, and hence the flag wouldn't even exist on the kernel side. This is all liburing internals. We could have a function in liburing ala io_uring_set_thread_shared() or something and mark it appropriately, and then have the existing API do the right thing. We could even have it return 0/-errno so that applications could handle the case where the kernel doesn't support it. Should probably name it after what it is, though, so io_uring_set_cqwait_timeout() or something like that. We also need to consider what should happen if the app has asked for this behavior, and wait_cqe() is called with pending submissions. Do we return an error for this case, or just assume that's what the applications wants? There's really two cases here: 1) liburing wants to use the newer timeout mechanism, but the application hasn't told us if it cares or not. 2) Application explicitly asked for the new behavior, as it relies on it. This is a trivial case, as we should not be looking at the SQ side at all for this one. For #1, probably safest to just flush existing submissions and still use the new API. If the app didn't ask for thread safe SQ/CQ, then it should not care. And this is existing behavior, after all. So maybe it'll work out OK. I guess the only remaining hurdle is the change of struct io_uring, which we'll need to handle carefully. -- Jens Axboe