On 7/8/20 10:51 AM, Xiaoguang Wang wrote: > hi, > >> On 7/8/20 9:39 AM, Xiaoguang Wang wrote: >>> hi, >>> >>>> On 7/7/20 11:29 PM, Xiaoguang Wang wrote: >>>>> I modify above test program a bit: >>>>> #include <errno.h> >>>>> #include <stdio.h> >>>>> #include <unistd.h> >>>>> #include <stdlib.h> >>>>> #include <string.h> >>>>> #include <fcntl.h> >>>>> #include <assert.h> >>>>> >>>>> #include "liburing.h" >>>>> >>>>> static void test_cq_overflow(struct io_uring *ring) >>>>> { >>>>> struct io_uring_cqe *cqe; >>>>> struct io_uring_sqe *sqe; >>>>> int issued = 0; >>>>> int ret = 0; >>>>> int i; >>>>> >>>>> for (i = 0; i < 33; i++) { >>>>> sqe = io_uring_get_sqe(ring); >>>>> if (!sqe) { >>>>> fprintf(stderr, "get sqe failed\n"); >>>>> break;; >>>>> } >>>>> ret = io_uring_submit(ring); >>>>> if (ret <= 0) { >>>>> if (ret != -EBUSY) >>>>> fprintf(stderr, "sqe submit failed: %d\n", ret); >>>>> break; >>>>> } >>>>> issued++; >>>>> } >>>>> >>>>> printf("issued requests: %d\n", issued); >>>>> >>>>> while (issued) { >>>>> ret = io_uring_peek_cqe(ring, &cqe); >>>>> if (ret) { >>>>> if (ret != -EAGAIN) { >>>>> fprintf(stderr, "peek completion failed: %s\n", >>>>> strerror(ret)); >>>>> break; >>>>> } >>>>> printf("left requets: %d %d\n", issued, IO_URING_READ_ONCE(*ring->sq.kflags)); >>>>> continue; >>>>> } >>>>> io_uring_cqe_seen(ring, cqe); >>>>> issued--; >>>>> printf("left requets: %d\n", issued); >>>>> } >>>>> } >>>>> >>>>> int main(int argc, char *argv[]) >>>>> { >>>>> int ret; >>>>> struct io_uring ring; >>>>> >>>>> ret = io_uring_queue_init(16, &ring, 0); >>>>> if (ret) { >>>>> fprintf(stderr, "ring setup failed: %d\n", ret); >>>>> return 1; >>>>> } >>>>> >>>>> test_cq_overflow(&ring); >>>>> return 0; >>>>> } >>>>> >>>>> Though with your patches applied, we still can not peek the last cqe. >>>>> This test program will only issue 33 sqes, so it won't get EBUSY error. >>>> >>>> How about we make this even simpler, then - make the >>>> IORING_SQ_CQ_OVERFLOW actually track the state, rather than when we fail >>>> on submission. The liburing change would be the same, the kernel side >>>> would then look like the below. >>>> >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index 4c9a494c9f9f..01981926cdf4 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -1342,6 +1342,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) >>>> if (cqe) { >>>> clear_bit(0, &ctx->sq_check_overflow); >>>> clear_bit(0, &ctx->cq_check_overflow); >>>> + ctx->rings->sq_flags &= ~IORING_SQ_CQ_OVERFLOW; >>>> } >>>> spin_unlock_irqrestore(&ctx->completion_lock, flags); >>>> io_cqring_ev_posted(ctx); >>>> @@ -1379,6 +1380,7 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags) >>>> if (list_empty(&ctx->cq_overflow_list)) { >>>> set_bit(0, &ctx->sq_check_overflow); >>>> set_bit(0, &ctx->cq_check_overflow); >>>> + ctx->rings->sq_flags |= IORING_SQ_CQ_OVERFLOW; > Some callers to __io_cqring_fill_event() don't hold completion_lock, for example: > ==> io_iopoll_complete > ====> __io_cqring_fill_event() > So this patch maybe still not safe when SQPOLL is enabled. > Do you perfer adding a new lock or just do completion_lock here only when cq ring is overflowed? The polled side isn't IRQ driven, so should be serialized separately. This works because we don't allow non-polled IO on a polled context, and vice versa. If not, we'd have bigger issues than just the flags modification. So it should be fine as-is. -- Jens Axboe