The problem occurs in two async processes, One is when a new IO calls the blk_mq_start_request() interface to start sending,The other is that the block layer timer process calls the blk_mq_req_expired interface to check whether there is an IO timeout. When an instruction out of sequence occurs between blk_add_timer and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface blk_mq_start_request,at this time, the block timer is checking the new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT and req->deadline is 0 at this time, the new IO will be misjudged as a timeout. Our repair plan is for the deadline to be 0, and we do not think that a timeout occurs. At the same time, because the jiffies of the 32-bit system will be reversed shortly after the system is turned on, we will set bit 1 to the deadline at this time. Signed-off-by: Gu Mi <gumi@xxxxxxxxxxxxxxxxx> --- v2->v3: the workaround is the same as v1 patch, modification in blk_add_timer() is to prevent deadline set to valid and equal to 0, and guaranteed to check that deadline==0 in blk_mq_req_expired() is only equivalent to the invalid value 0 set by req initialization block/blk-mq.c | 2 ++ block/blk-timeout.c | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index bbf5434..f36280b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -947,6 +947,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next) return false; deadline = READ_ONCE(rq->deadline); + if (unlikely(deadline == 0)) + return false; if (time_after_eq(jiffies, deadline)) return true; diff --git a/block/blk-timeout.c b/block/blk-timeout.c index 1b8de041..9e1c00c 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -140,6 +140,11 @@ void blk_add_timer(struct request *req) req->rq_flags &= ~RQF_TIMED_OUT; expiry = jiffies + req->timeout; +#ifndef CONFIG_64BIT +/* In case INITIAL_JIFFIES wraps on 32-bit */ + if (expiry == 0) + expiry |= 1UL; +#endif WRITE_ONCE(req->deadline, expiry); /* -- 1.8.3.1