On 12/18/2019 4:02 PM, Jens Axboe wrote: > On 12/18/19 2:38 AM, Pavel Begunkov wrote: >> On 12/18/2019 3:06 AM, Jens Axboe wrote: >>> On 12/17/19 4:55 PM, Jann Horn wrote: >>>> On Tue, Dec 17, 2019 at 11:54 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>> There is no reliable way to submit and wait in a single syscall, as >>>>> io_submit_sqes() may under-consume sqes (in case of an early error). >>>>> Then it will wait for not-yet-submitted requests, deadlocking the user >>>>> in most cases. >>>>> >>>>> In such cases adjust min_complete, so it won't wait for more than >>>>> what have been submitted in the current io_uring_enter() call. It >>>>> may be less than total in-flight, but that up to a user to handle. >>>>> >>>>> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> >>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>> [...] >>>>> if (flags & IORING_ENTER_GETEVENTS) { >>>>> unsigned nr_events = 0; >>>>> >>>>> min_complete = min(min_complete, ctx->cq_entries); >>>>> + if (submitted != to_submit) >>>>> + min_complete = min(min_complete, (u32)submitted); >>>> >>>> Hm. Let's say someone submits two requests, first an ACCEPT request >>>> that might stall indefinitely and then a WRITE to a file on disk that >>>> is expected to complete quickly; and the caller uses min_complete=1 >>>> because they want to wait for the WRITE op. But now the submission of >>>> the WRITE fails, io_uring_enter() computes min_complete=min(1, 1)=1, >>>> and it blocks on the ACCEPT op. That would be bad, right? >>>> >>>> If the usecase I described is valid, I think it might make more sense >>>> to do something like this: >>>> >>>> u32 missing_submissions = to_submit - submitted; >>>> min_complete = min(min_complete, ctx->cq_entries); >>>> if ((flags & IORING_ENTER_GETEVENTS) && missing_submissions < min_complete) { >>>> min_complete -= missing_submissions; >>>> [...] >>>> } >>>> >>>> In other words: If we do a partially successful submission, only wait >>>> as long as we know that userspace definitely wants us to wait for one >>>> of the pending requests; and once we can't tell whether userspace >>>> intended to wait longer, return to userspace and let the user decide. >>>> >>>> Or it might make sense to just ignore IORING_ENTER_GETEVENTS >>>> completely in the partial submission case, in case userspace wants to >>>> immediately react to the failed request by writing out an error >>>> message to a socket or whatever. This case probably isn't >>>> performance-critical, right? And it would simplify things a bit. >>> >>> That's a good point, and Pavel's first patch actually did that. I >>> didn't consider the different request type case, which might be >>> uncommon but definitely valid. >>> >>> Probably the safest bet here is just to not wait at all if we fail >>> submitting all of them. This isn't a fast path, there was an error >>> somehow which meant we didn't submit it all. So just return the >>> submit count (including 0, not -EAGAIN) if we fail submitting, >>> and ignore IORING_ENTER_GETEVENTS for that case. >>> >> I see nothing wrong with -EAGAIN, it's returned only if it can't >> allocate memory for the first request. If so, can you then just take the >> v1? It will probably be applied cleanly. > > -EAGAIN for request alloc is fine, but your v1 also returned -EAGAIN if > someone asked to submit 0, which is a bug. We must return zero for that > case. > Now I see which -EAGAIN you meant. You're right, I'll resend > So your v1 without that would work, something ala: > > if (submitted != to_submit) > goto out; > > without the turning 0 into -EAGAIN unconditionally, io_submit_sqes() does > the right thing there as it is. > -- Pavel Begunkov