Re: [PATCH] io_uring: fix possible deadlock in io_uring_poll

Pavel Begunkov <asml.silence@xxxxxxxxx> · Wed, 3 Feb 2021 00:04:02 +0000

On 02/02/2021 19:52, Hao Xu wrote:
> This might happen if we do epoll_wait on a uring fd while reading/writing
> the former epoll fd in a sqe in the former uring instance.
> So let's don't flush cqring overflow list when we fail to get the uring
> lock. This leads to less accuracy, but is still ok.

if (io_cqring_events(ctx) || test_bit(0, &ctx->cq_check_overflow))
        mask |= EPOLLIN | EPOLLRDNORM;

Instead of flushing. It'd make sense if we define poll as "there might
be something, go do your peek/wait with overflow checks". Jens, is that
documented anywhere?

> 
> Reported-by: Abaci <abaci@xxxxxxxxxxxxxxxxx>
> Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync")
> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx>
> ---
> 
> Here I use mutex_trylock() to fix this issue, but this causes loss of
> accuracy. I think doing cqring overflow flush in a task work maybe a
> better solution. I'm think of this. Any thoughts?
> 
>  fs/io_uring.c | 31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 38c6cbe1ab38..866e45d42ac7 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8718,7 +8718,36 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
>  	smp_rmb();
>  	if (!io_sqring_full(ctx))
>  		mask |= EPOLLOUT | EPOLLWRNORM;
> -	io_cqring_overflow_flush(ctx, false, NULL, NULL);
> +
> +	if (test_bit(0, &ctx->cq_check_overflow)) {
> +		bool should_flush = true;
> +		/* iopoll syncs against uring_lock, not completion_lock */
> +		if (ctx->flags & IORING_SETUP_IOPOLL) {
> +			/*
> +			 * avoid ABBA deadlock.
> +			 * there could be contention like below:
> +			 *      CPU0                    CPU1
> +			 *      ----                    ----
> +			 * lock(&ctx->uring_lock);
> +			 *                              lock(&ep->mtx);
> +			 *                              lock(&ctx->uring_lock);
> +			 * lock(&ep->mtx);
> +			 *
> +			 * this might happen if we do epoll_wait on a uring fd while
> +			 * read/write the former epoll fd in a sqe in the former uring
> +			 * instance.
> +			 * We don't flush cqring overflow list when we fail to get the
> +			 * uring lock. This leads to less accuracy, but is still ok.
> +			 */
> +			should_flush = mutex_trylock(&ctx->uring_lock);
> +		}
> +		if (should_flush) {
> +			__io_cqring_overflow_flush(ctx, false, NULL, NULL);
> +			if (ctx->flags & IORING_SETUP_IOPOLL)
> +				mutex_unlock(&ctx->uring_lock);
> +		}
> +	}
> +
>  	if (io_cqring_events(ctx))
>  		mask |= EPOLLIN | EPOLLRDNORM;
>  
> 

-- 
Pavel Begunkov