On 7/7/20 7:24 AM, Xiaoguang Wang wrote: > For those applications which are not willing to use io_uring_enter() > to reap and handle cqes, they may completely rely on liburing's > io_uring_peek_cqe(), but if cq ring has overflowed, currently because > io_uring_peek_cqe() is not aware of this overflow, it won't enter > kernel to flush cqes, below test program can reveal this bug: > > static void test_cq_overflow(struct io_uring *ring) > { > struct io_uring_cqe *cqe; > struct io_uring_sqe *sqe; > int issued = 0; > int ret = 0; > > do { > sqe = io_uring_get_sqe(ring); > if (!sqe) { > fprintf(stderr, "get sqe failed\n"); > break;; > } > ret = io_uring_submit(ring); > if (ret <= 0) { > if (ret != -EBUSY) > fprintf(stderr, "sqe submit failed: %d\n", ret); > break; > } > issued++; > } while (ret > 0); > assert(ret == -EBUSY); > > printf("issued requests: %d\n", issued); > > while (issued) { > ret = io_uring_peek_cqe(ring, &cqe); > if (ret) { > if (ret != -EAGAIN) { > fprintf(stderr, "peek completion failed: %s\n", > strerror(ret)); > break; > } > printf("left requets: %d\n", issued); > continue; > } > io_uring_cqe_seen(ring, cqe); > issued--; > printf("left requets: %d\n", issued); > } > } > > int main(int argc, char *argv[]) > { > int ret; > struct io_uring ring; > > ret = io_uring_queue_init(16, &ring, 0); > if (ret) { > fprintf(stderr, "ring setup failed: %d\n", ret); > return 1; > } > > test_cq_overflow(&ring); > return 0; > } > > To fix this issue, export cq overflow status to userspace, then > helper functions() in liburing, such as io_uring_peek_cqe, can be > aware of this cq overflow and do flush accordingly. Is there any way we can accomplish the same without exporting another set of flags? Would it be enough for the SQPOLl thread to set IORING_SQ_NEED_WAKEUP if we're in overflow condition? That should result in the app entering the kernel when it's flushed the user CQ side, and then the sqthread could attempt to flush the pending events as well. Something like this, totally untested... diff --git a/fs/io_uring.c b/fs/io_uring.c index d37d7ea5ebe5..d409bd68553f 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6110,8 +6110,18 @@ static int io_sq_thread(void *data) } mutex_lock(&ctx->uring_lock); - if (likely(!percpu_ref_is_dying(&ctx->refs))) + if (likely(!percpu_ref_is_dying(&ctx->refs))) { +retry: ret = io_submit_sqes(ctx, to_submit, NULL, -1); + if (unlikely(ret == -EBUSY)) { + ctx->rings->sq_flags |= IORING_SQ_NEED_WAKEUP; + smp_mb(); + if (io_cqring_overflow_flush(ctx, false)) { + ctx->rings->sq_flags &= ~IORING_SQ_NEED_WAKEUP; + goto retry; + } + } + } mutex_unlock(&ctx->uring_lock); timeout = jiffies + ctx->sq_thread_idle; } -- Jens Axboe