Currently we drop completion events, if the CQ ring is full. That's fine for requests with bounded completion times, but it may make it harder to use io_uring with networked IO where request completion times are generally unbounded. Or with POLL, for example, which is also unbounded. This patch adds IORING_SETUP_CQ_NODROP, which changes the behavior a bit for CQ ring overflows. First of all, it doesn't overflow the ring, it simply stores backlog of completions that we weren't able to put into the CQ ring. To prevent the backlog from growing indefinitely, if the backlog is non-empty, we apply back pressure on IO submissions. Any attempt to submit new IO with a non-empty backlog will get an -EBUSY return from the kernel. I think that makes for a pretty sane API in terms of how the application can handle it. With CQ_NODROP enabled, we'll never drop a completion event, but we'll also not allow submissions with a completion backlog. Changes since v2: - Add io_double_put_req() helper for the cases where we need to drop both the submit and complete reference. We didn't need this before as we could just free the request unconditionally, but we don't know if that's the case anymore if add/fill grabs a reference to it. - Fix linked request dropping. Changes since v1: - Drop the cqe_drop structure and allocation, simply use the io_kiocb for the overflow backlog - Rebase on top of Pavel's series which made this cleaner - Add prep patch for the fill/add CQ handler changes fs/io_uring.c | 209 +++++++++++++++++++++++----------- include/uapi/linux/io_uring.h | 1 + 2 files changed, 143 insertions(+), 67 deletions(-) -- Jens Axboe