On 2/3/21 4:49 AM, Pavel Begunkov wrote: > On 02/02/2021 20:56, Pavel Begunkov wrote: >> On 02/02/2021 20:48, Jens Axboe wrote: >>> On 2/2/21 1:34 PM, Pavel Begunkov wrote: >>>> On 02/02/2021 17:41, Pavel Begunkov wrote: >>>>> On 02/02/2021 17:24, Jens Axboe wrote: >>>>>> On 2/2/21 10:10 AM, Victor Stewart wrote: >>>>>>>> Can you send the updated test app? >>>>>>> >>>>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56 >>>>>>> >>>>>>> same link i just updated the same gist >>>>>> >>>>>> And how are you running it? >>>>> >>>>> with SQPOLL with FIXED FLAG -> FAILURE: failed with error = ??? >>>>> -> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??) >>>> >>>> Ok, _io_uring_get_cqe() is just screwed twice >>>> >>>> TL;DR >>>> we enter into it with submit=0, do an iteration, which decrements it, >>>> then a second iteration passes submit=-1, which is returned back by >>>> the kernel as a result and propagated back from liburing... >>> >>> Yep, that's what I came up with too. We really just need a clear way >>> of knowing when to break out, and when to keep going. Eg if we've >>> done a loop and don't end up calling the system call, then there's >>> no point in continuing. >> >> We can bodge something up (and forget about it), and do much cleaner >> for IORING_FEAT_EXT_ARG, because we don't have LIBURING_UDATA_TIMEOUT >> reqs for it and so can remove peek and so on. > > This version looks reasonably simple, and even passes tests and all > issues found by Victor's test. Didn't test it yet, but should behave > similarly in regard of internal timeouts (pre IORING_FEAT_EXT_ARG). > > static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr, > struct get_data *data) > { > struct io_uring_cqe *cqe = NULL; > int ret = 0, err; > > do { > unsigned flags = 0; > unsigned nr_available; > bool enter = false; > > err = __io_uring_peek_cqe(ring, &cqe, &nr_available); > if (err) > break; > > /* IOPOLL won't proceed when there're not reaped CQEs */ > if (cqe && (ring->flags & IORING_SETUP_IOPOLL)) > data->wait_nr = 0; > > if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) { > flags = IORING_ENTER_GETEVENTS | data->get_flags; > enter = true; > } > if (data->submit) { > sq_ring_needs_enter(ring, &flags); > enter = true; > } > if (!enter) > break; > > ret = __sys_io_uring_enter2(ring->ring_fd, data->submit, > data->wait_nr, flags, data->arg, > data->sz); > if (ret < 0) { > err = -errno; > break; > } > data->submit -= ret; > } while (1); > > *cqe_ptr = cqe; > return err; > } So here's my take on this - any rewrite of _io_uring_get_cqe() is going to end up adding special cases, that's unfortunately just the nature of the game. And since we're going to be doing a new liburing release very shortly, this isn't a great time to add a rewrite of it. It'll certainly introduce more bugs than it solves, and hence regressions, no matter how careful we are. Hence my suggestion is to just patch this in a trivial kind of fashion, even if it doesn't really make the function any prettier. But it'll be safer for a release, and then we can rework the function after. With that in mind, here's my suggestion. The premise is if we go through the loop and don't do io_uring_enter(), then there's no point in continuing. That's the trivial fix. diff --git a/src/queue.c b/src/queue.c index 94f791e..4161aa7 100644 --- a/src/queue.c +++ b/src/queue.c @@ -89,12 +89,13 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt { struct io_uring_cqe *cqe = NULL; const int to_wait = data->wait_nr; - int ret = 0, err; + int err; do { bool cq_overflow_flush = false; unsigned flags = 0; unsigned nr_available; + int ret = -2; err = __io_uring_peek_cqe(ring, &cqe, &nr_available); if (err) @@ -117,7 +118,9 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt ret = __sys_io_uring_enter2(ring->ring_fd, data->submit, data->wait_nr, flags, data->arg, data->sz); - if (ret < 0) { + if (ret == -2) { + break; + } else if (ret < 0) { err = -errno; } else if (ret == (int)data->submit) { data->submit = 0; -- Jens Axboe