On 10/15/19 7:35 PM, yangerkun wrote: > > > On 2019/10/15 21:59, yangerkun wrote: >> Now we recalculate the sequence of timeout with 'req->sequence = >> ctx->cached_sq_head + count - 1', judge the right place to insert >> for timeout_list by compare the number of request we still expected for >> completion. But we have not consider about the situation of overflow: >> >> 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for >> the new timeout req can have a small req->sequence. >> >> 2. cached_sq_head of now may overflow compare with before req. And it >> will lead the timeout req with small req->sequence. >> >> This overflow will lead to the misorder of timeout_list, which can lead >> to the wrong order of the completion of timeout_list. Fix it by reuse >> req->submit.sequence to store the count, and change the logic of >> inserting sort in io_timeout. >> >> Signed-off-by: yangerkun <yangerkun@xxxxxxxxxx> >> --- >> fs/io_uring.c | 27 +++++++++++++++++++++------ >> 1 file changed, 21 insertions(+), 6 deletions(-) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 76fdbe84aff5..c9512da06973 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -1884,7 +1884,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer) >> >> static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> { >> - unsigned count, req_dist, tail_index; >> + unsigned count; >> struct io_ring_ctx *ctx = req->ctx; >> struct list_head *entry; >> struct timespec64 ts; >> @@ -1907,21 +1907,36 @@ static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> count = 1; >> >> req->sequence = ctx->cached_sq_head + count - 1; >> + /* reuse it to store the count */ >> + req->submit.sequence = count; >> req->flags |= REQ_F_TIMEOUT; >> >> /* >> * Insertion sort, ensuring the first entry in the list is always >> * the one we need first. >> */ >> - tail_index = ctx->cached_cq_tail - ctx->rings->sq_dropped; >> - req_dist = req->sequence - tail_index; >> spin_lock_irq(&ctx->completion_lock); >> list_for_each_prev(entry, &ctx->timeout_list) { >> struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); >> - unsigned dist; >> + unsigned nxt_sq_head; >> + long long tmp, tmp_nxt; >> >> - dist = nxt->sequence - tail_index; >> - if (req_dist >= dist) >> + /* >> + * Since cached_sq_head + count - 1 can overflow, use type long >> + * long to store it. >> + */ >> + tmp = (long long)ctx->cached_sq_head + count - 1; >> + nxt_sq_head = nxt->sequence - nxt->submit.sequence + 1; >> + tmp_nxt = (long long)nxt_sq_head + nxt->submit.sequence - 1; >> + >> + /* >> + * cached_sq_head may overflow, and it will never overflow twice >> + * once there is some timeout req still be valid. >> + */ >> + if (ctx->cached_sq_head < nxt_sq_head) >> + tmp_nxt += UINT_MAX; > > Maybe there is a mistake, it should be tmp. So sorry about this. I ran it through the basic testing, but I guess it doesn't catch overflow cases. Maybe we can come up with one? Should be pretty simple to setup a io_uring, post UINT_MAX - 10 nops (or something like that), then do some timeout testing. Just send an incremental patch to fix it. -- Jens Axboe