On 8/24/22 04:36, Gu Mi wrote:
The problem occurs in two async processes, One is when a new IO
calls the blk_mq_start_request() interface to start sending,The other
is that the block layer timer process calls the blk_mq_req_expired
interface to check whether there is an IO timeout.
When an instruction out of sequence occurs between blk_add_timer
and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface
blk_mq_start_request,at this time, the block timer is checking the
new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT
and req->deadline is 0 at this time, the new IO will be misjudged as
a timeout.
Our repair plan is for the deadline to be 0, and we do not think
that a timeout occurs. At the same time, because the jiffies of the
32-bit system will be reversed shortly after the system is turned on,
we will add 1 jiffies to the deadline at this time.
Signed-off-by: Gu Mi <gumi@xxxxxxxxxxxxxxxxx>
---
block/blk-mq.c | 2 ++
block/blk-timeout.c | 4 ++++
2 files changed, 6 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4b90d2d..6defaa1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
deadline = READ_ONCE(rq->deadline);
+ if (unlikely(deadline == 0))
+ return false;
if (time_after_eq(jiffies, deadline))
return true;
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 1b8de041..6fc5088 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -140,6 +140,10 @@ void blk_add_timer(struct request *req)
req->rq_flags &= ~RQF_TIMED_OUT;
expiry = jiffies + req->timeout;
+#ifndef CONFIG_64BIT
+/* In case INITIAL_JIFFIES wraps on 32-bit */
+ expiry |= 1UL;
+#endif
WRITE_ONCE(req->deadline, expiry);
/*
Shouldn't this be fixed by inserting a barrier inside
blk_mq_start_request() instead of a patch like the above?
Thanks,
Bart.