Re: [PATCH] io_uring: export cq overflow status to userspace

Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx> · Wed, 8 Jul 2020 00:29:39 +0800

hi,

On 7/7/20 7:24 AM, Xiaoguang Wang wrote:
For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:

static void test_cq_overflow(struct io_uring *ring)
{
         struct io_uring_cqe *cqe;
         struct io_uring_sqe *sqe;
         int issued = 0;
         int ret = 0;

         do {
                 sqe = io_uring_get_sqe(ring);
                 if (!sqe) {
                         fprintf(stderr, "get sqe failed\n");
                         break;;
                 }
                 ret = io_uring_submit(ring);
                 if (ret <= 0) {
                         if (ret != -EBUSY)
                                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                         break;
                 }
                 issued++;
         } while (ret > 0);
         assert(ret == -EBUSY);

         printf("issued requests: %d\n", issued);

         while (issued) {
                 ret = io_uring_peek_cqe(ring, &cqe);
                 if (ret) {
                         if (ret != -EAGAIN) {
                                 fprintf(stderr, "peek completion failed: %s\n",
                                         strerror(ret));
                                 break;
                         }
                         printf("left requets: %d\n", issued);
                         continue;
                 }
                 io_uring_cqe_seen(ring, cqe);
                 issued--;
                 printf("left requets: %d\n", issued);
         }
}

int main(int argc, char *argv[])
{
         int ret;
         struct io_uring ring;

         ret = io_uring_queue_init(16, &ring, 0);
         if (ret) {
                 fprintf(stderr, "ring setup failed: %d\n", ret);
                 return 1;
         }

         test_cq_overflow(&ring);
         return 0;
}

To fix this issue, export cq overflow status to userspace, then
helper functions() in liburing, such as io_uring_peek_cqe, can be
aware of this cq overflow and do flush accordingly.

Is there any way we can accomplish the same without exporting
another set of flags? 
I understand your concerns and will try to find some better methods later,
but not sure there're some better :)

Would it be enough for the SQPOLl thread to set
IORING_SQ_NEED_WAKEUP if we're in overflow condition? That should
result in the app entering the kernel when it's flushed the user CQ
side, and then the sqthread could attempt to flush the pending
events as well.

Something like this, totally untested...
I haven't test your patch, but I think it doesn't work for non-sqpoll case, see
my above test program, it doesn't have SQPOLL enabled.

Regards,
Xiaoguang Wang

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d37d7ea5ebe5..d409bd68553f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6110,8 +6110,18 @@ static int io_sq_thread(void *data)
  		}
  
  		mutex_lock(&ctx->uring_lock);
-		if (likely(!percpu_ref_is_dying(&ctx->refs)))
+		if (likely(!percpu_ref_is_dying(&ctx->refs))) {
+retry:
  			ret = io_submit_sqes(ctx, to_submit, NULL, -1);
+			if (unlikely(ret == -EBUSY)) {
+				ctx->rings->sq_flags |= IORING_SQ_NEED_WAKEUP;
+				smp_mb();
+				if (io_cqring_overflow_flush(ctx, false)) {
+					ctx->rings->sq_flags &= ~IORING_SQ_NEED_WAKEUP;
+					goto retry;
+				}
+			}
+		}
  		mutex_unlock(&ctx->uring_lock);
  		timeout = jiffies + ctx->sq_thread_idle;
  	}