Re: [PATCH v2 1/2] io_uring: only increment ->cq_timeouts along with ->cached_cq_tail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/12/2020 19:15, Marcelo Diop-Gonzalez wrote:
> The quantity ->cached_cq_tail - ->cq_timeouts is used to tell how many
> non-timeout events have happened, but this subtraction could overflow
> if ->cq_timeouts is incremented more times than ->cached_cq_tail.
> It's maybe unlikely, but currently this can happen if a timeout event
> overflows the cqring, since in that case io_get_cqring() doesn't
> increment ->cached_cq_tail, but ->cq_timeouts is incremented by the
> caller. Fix it by incrementing ->cq_timeouts inside io_get_cqring().
> 
> Signed-off-by: Marcelo Diop-Gonzalez <marcelo827@xxxxxxxxx>
> ---
>  fs/io_uring.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index f3690dfdd564..f394bf358022 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1582,8 +1582,6 @@ static void io_kill_timeout(struct io_kiocb *req)
>  
>  	ret = hrtimer_try_to_cancel(&io->timer);
>  	if (ret != -1) {
> -		atomic_set(&req->ctx->cq_timeouts,
> -			atomic_read(&req->ctx->cq_timeouts) + 1);
>  		list_del_init(&req->timeout.list);
>  		io_cqring_fill_event(req, 0);
>  		io_put_req_deferred(req, 1);
> @@ -1664,7 +1662,7 @@ static inline bool io_sqring_full(struct io_ring_ctx *ctx)
>  	return READ_ONCE(r->sq.tail) - ctx->cached_sq_head == r->sq_ring_entries;
>  }
>  
> -static struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx)
> +static struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx, u8 opcode)
>  {
>  	struct io_rings *rings = ctx->rings;
>  	unsigned tail;
> @@ -1679,6 +1677,10 @@ static struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx)
>  		return NULL;
>  
>  	ctx->cached_cq_tail++;
> +	if (opcode == IORING_OP_TIMEOUT)
> +		atomic_set(&ctx->cq_timeouts,
> +			   atomic_read(&ctx->cq_timeouts) + 1);
> +

Don't think I like it. The function is pretty hot, so wouldn't want that extra
burden just for timeouts, which should be cold enough especially with the new
timeout CQ waits. Also passing opcode here is awkward and not very great
abstraction wise.

>  	return &rings->cqes[tail & ctx->cq_mask];
>  }
>  
> @@ -1728,7 +1730,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
>  		if (!io_match_task(req, tsk, files))
>  			continue;
>  
> -		cqe = io_get_cqring(ctx);
> +		cqe = io_get_cqring(ctx, req->opcode);
>  		if (!cqe && !force)
>  			break;
>  
> @@ -1776,7 +1778,7 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
>  	 * submission (by quite a lot). Increment the overflow count in
>  	 * the ring.
>  	 */
> -	cqe = io_get_cqring(ctx);
> +	cqe = io_get_cqring(ctx, req->opcode);
>  	if (likely(cqe)) {
>  		WRITE_ONCE(cqe->user_data, req->user_data);
>  		WRITE_ONCE(cqe->res, res);
> @@ -5618,8 +5620,6 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer)
>  
>  	spin_lock_irqsave(&ctx->completion_lock, flags);
>  	list_del_init(&req->timeout.list);
> -	atomic_set(&req->ctx->cq_timeouts,
> -		atomic_read(&req->ctx->cq_timeouts) + 1);
>  
>  	io_cqring_fill_event(req, -ETIME);
>  	io_commit_cqring(ctx);
> 

-- 
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux