On Tue, Dec 17, 2019 at 11:54 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > There is no reliable way to submit and wait in a single syscall, as > io_submit_sqes() may under-consume sqes (in case of an early error). > Then it will wait for not-yet-submitted requests, deadlocking the user > in most cases. > > In such cases adjust min_complete, so it won't wait for more than > what have been submitted in the current io_uring_enter() call. It > may be less than total in-flight, but that up to a user to handle. > > Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> [...] > if (flags & IORING_ENTER_GETEVENTS) { > unsigned nr_events = 0; > > min_complete = min(min_complete, ctx->cq_entries); > + if (submitted != to_submit) > + min_complete = min(min_complete, (u32)submitted); Hm. Let's say someone submits two requests, first an ACCEPT request that might stall indefinitely and then a WRITE to a file on disk that is expected to complete quickly; and the caller uses min_complete=1 because they want to wait for the WRITE op. But now the submission of the WRITE fails, io_uring_enter() computes min_complete=min(1, 1)=1, and it blocks on the ACCEPT op. That would be bad, right? If the usecase I described is valid, I think it might make more sense to do something like this: u32 missing_submissions = to_submit - submitted; min_complete = min(min_complete, ctx->cq_entries); if ((flags & IORING_ENTER_GETEVENTS) && missing_submissions < min_complete) { min_complete -= missing_submissions; [...] } In other words: If we do a partially successful submission, only wait as long as we know that userspace definitely wants us to wait for one of the pending requests; and once we can't tell whether userspace intended to wait longer, return to userspace and let the user decide. Or it might make sense to just ignore IORING_ENTER_GETEVENTS completely in the partial submission case, in case userspace wants to immediately react to the failed request by writing out an error message to a socket or whatever. This case probably isn't performance-critical, right? And it would simplify things a bit.