Hi Ming. Ming Lei - 15.04.18, 17:43: > Hi Jens, > > This two patches fixes the recently discussed race between completion > and BLK_EH_RESET_TIMER. > > Israel & Martin, this one is a simpler fix on this issue and can > cover the potencial hang of MQ_RQ_COMPLETE_IN_TIMEOUT request, could > you test V4 and see if your issue can be fixed? In replacement of all the three other patches I applied? - '[PATCH] blk-mq_Directly schedule q->timeout_work when aborting a request.mbox' - '[PATCH v2] block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}().mbox' - '[PATCH v4] blk-mq_Fix race conditions in request timeout handling.mbox' These patches worked reliably so far both for the hang on boot and error reading SMART data. I´d compile a kernel tomorrow or Tuesday I think. > V4: > - run synchronize_rcu() once for handling all timed out request > between .timeout() and the following handling > - address tj's concern about reorder between blk_add_timer() and > blk_mq_rq_update_aborted_gstate(req, 0) > > V3: > - before completing rq for BLK_EH_HANDLED, sync with normal > completion path - make sure rq's state updated as MQ_RQ_IN_FLIGHT > before completing V2: > - rename the new flag as MQ_RQ_COMPLETE_IN_TIMEOUT > - fix lock uses in blk_mq_rq_timed_out > - document re-order between blk_add_timer() and > blk_mq_rq_update_aborted_gstate(req, 0) > > > Ming Lei (2): > blk-mq: set RQF_MQ_TIMEOUT_EXPIRED when the rq's timeout isn't > handled blk-mq: fix race between complete and BLK_EH_RESET_TIMER > > block/blk-mq.c | 120 > +++++++++++++++++++++++++++++++++++++++---------- block/blk-mq.h > | 1 + > block/blk-timeout.c | 1 - > include/linux/blkdev.h | 6 +++ > 4 files changed, 104 insertions(+), 24 deletions(-) -- Martin