On 11/9/2019 3:25 PM, Pavel Begunkov wrote: > On 11/7/2019 7:00 PM, Jens Axboe wrote: >> Currently we drop completion events, if the CQ ring is full. That's fine >> for requests with bounded completion times, but it may make it harder to >> use io_uring with networked IO where request completion times are >> generally unbounded. Or with POLL, for example, which is also unbounded. >> >> This patch adds IORING_SETUP_CQ_NODROP, which changes the behavior a bit >> for CQ ring overflows. First of all, it doesn't overflow the ring, it >> simply stores a backlog of completions that we weren't able to put into >> the CQ ring. To prevent the backlog from growing indefinitely, if the >> backlog is non-empty, we apply back pressure on IO submissions. Any >> attempt to submit new IO with a non-empty backlog will get an -EBUSY >> return from the kernel. This is a signal to the application that it has >> backlogged CQ events, and that it must reap those before being allowed >> to submit more IO.> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> --- >> fs/io_uring.c | 103 ++++++++++++++++++++++++++++------ >> include/uapi/linux/io_uring.h | 1 + >> 2 files changed, 87 insertions(+), 17 deletions(-) >> >> +static void io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) >> +{ >> + struct io_rings *rings = ctx->rings; >> + struct io_uring_cqe *cqe; >> + struct io_kiocb *req; >> + unsigned long flags; >> + LIST_HEAD(list); >> + >> + if (list_empty_careful(&ctx->cq_overflow_list)) >> + return; >> + if (ctx->cached_cq_tail - READ_ONCE(rings->cq.head) == >> + rings->cq_ring_entries) >> + return; >> + >> + spin_lock_irqsave(&ctx->completion_lock, flags); >> + >> + while (!list_empty(&ctx->cq_overflow_list)) { >> + cqe = io_get_cqring(ctx); >> + if (!cqe && !force) >> + break;> + >> + req = list_first_entry(&ctx->cq_overflow_list, struct io_kiocb, >> + list); >> + list_move(&req->list, &list); >> + if (cqe) { >> + WRITE_ONCE(cqe->user_data, req->user_data); >> + WRITE_ONCE(cqe->res, req->result); >> + WRITE_ONCE(cqe->flags, 0); >> + } > > Hmm, second thought. We should account overflow here. > Clarification: We should account overflow in case of (!cqe). i.e. if (!cqe) { // else WRITE_ONCE(ctx->rings->cq_overflow, atomic_inc_return(&ctx->cached_cq_overflow)); } >> + } >> + >> + io_commit_cqring(ctx); >> + spin_unlock_irqrestore(&ctx->completion_lock, flags); >> + io_cqring_ev_posted(ctx); >> + >> + while (!list_empty(&list)) { >> + req = list_first_entry(&list, struct io_kiocb, list); >> + list_del(&req->list); >> + io_put_req(req, NULL); >> + } >> +} >> + -- Pavel Begunkov