Re: [PATCH] io_uring: export cq overflow status to userspace

Jens Axboe <axboe@xxxxxxxxx> · Wed, 8 Jul 2020 15:33:45 -0600

On 7/8/20 10:51 AM, Xiaoguang Wang wrote:
> hi,
> 
>> On 7/8/20 9:39 AM, Xiaoguang Wang wrote:
>>> hi,
>>>
>>>> On 7/7/20 11:29 PM, Xiaoguang Wang wrote:
>>>>> I modify above test program a bit:
>>>>> #include <errno.h>
>>>>> #include <stdio.h>
>>>>> #include <unistd.h>
>>>>> #include <stdlib.h>
>>>>> #include <string.h>
>>>>> #include <fcntl.h>
>>>>> #include <assert.h>
>>>>>
>>>>> #include "liburing.h"
>>>>>
>>>>> static void test_cq_overflow(struct io_uring *ring)
>>>>> {
>>>>>            struct io_uring_cqe *cqe;
>>>>>            struct io_uring_sqe *sqe;
>>>>>            int issued = 0;
>>>>>            int ret = 0;
>>>>>            int i;
>>>>>
>>>>>            for (i = 0; i < 33; i++) {
>>>>>                    sqe = io_uring_get_sqe(ring);
>>>>>                    if (!sqe) {
>>>>>                            fprintf(stderr, "get sqe failed\n");
>>>>>                            break;;
>>>>>                    }
>>>>>                    ret = io_uring_submit(ring);
>>>>>                    if (ret <= 0) {
>>>>>                            if (ret != -EBUSY)
>>>>>                                    fprintf(stderr, "sqe submit failed: %d\n", ret);
>>>>>                            break;
>>>>>                    }
>>>>>                    issued++;
>>>>>            }
>>>>>
>>>>>            printf("issued requests: %d\n", issued);
>>>>>
>>>>>            while (issued) {
>>>>>                    ret = io_uring_peek_cqe(ring, &cqe);
>>>>>                    if (ret) {
>>>>>                            if (ret != -EAGAIN) {
>>>>>                                    fprintf(stderr, "peek completion failed: %s\n",
>>>>>                                            strerror(ret));
>>>>>                                    break;
>>>>>                            }
>>>>>                            printf("left requets: %d %d\n", issued, IO_URING_READ_ONCE(*ring->sq.kflags));
>>>>>                            continue;
>>>>>                    }
>>>>>                    io_uring_cqe_seen(ring, cqe);
>>>>>                    issued--;
>>>>>                    printf("left requets: %d\n", issued);
>>>>>            }
>>>>> }
>>>>>
>>>>> int main(int argc, char *argv[])
>>>>> {
>>>>>            int ret;
>>>>>            struct io_uring ring;
>>>>>
>>>>>            ret = io_uring_queue_init(16, &ring, 0);
>>>>>            if (ret) {
>>>>>                    fprintf(stderr, "ring setup failed: %d\n", ret);
>>>>>                    return 1;
>>>>>            }
>>>>>
>>>>>            test_cq_overflow(&ring);
>>>>>            return 0;
>>>>> }
>>>>>
>>>>> Though with your patches applied, we still can not peek the last cqe.
>>>>> This test program will only issue 33 sqes, so it won't get EBUSY error.
>>>>
>>>> How about we make this even simpler, then - make the
>>>> IORING_SQ_CQ_OVERFLOW actually track the state, rather than when we fail
>>>> on submission. The liburing change would be the same, the kernel side
>>>> would then look like the below.
>>>>
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 4c9a494c9f9f..01981926cdf4 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -1342,6 +1342,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
>>>>    	if (cqe) {
>>>>    		clear_bit(0, &ctx->sq_check_overflow);
>>>>    		clear_bit(0, &ctx->cq_check_overflow);
>>>> +		ctx->rings->sq_flags &= ~IORING_SQ_CQ_OVERFLOW;
>>>>    	}
>>>>    	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>>>    	io_cqring_ev_posted(ctx);
>>>> @@ -1379,6 +1380,7 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
>>>>    		if (list_empty(&ctx->cq_overflow_list)) {
>>>>    			set_bit(0, &ctx->sq_check_overflow);
>>>>    			set_bit(0, &ctx->cq_check_overflow);
>>>> +			ctx->rings->sq_flags |= IORING_SQ_CQ_OVERFLOW;
> Some callers to __io_cqring_fill_event() don't hold completion_lock, for example:
> ==> io_iopoll_complete
> ====> __io_cqring_fill_event()
> So this patch maybe still not safe when SQPOLL is enabled.
> Do you perfer adding a new lock or just do completion_lock here only when cq ring is overflowed?

The polled side isn't IRQ driven, so should be serialized separately. This works
because we don't allow non-polled IO on a polled context, and vice versa. If not,
we'd have bigger issues than just the flags modification.

So it should be fine as-is.

-- 
Jens Axboe