Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Tue, 10 Apr 2018 15:02:11 +0000

On Tue, 2018-04-10 at 22:30 +0800, Ming Lei wrote:
> On Tue, Apr 10, 2018 at 02:09:33PM +0000, Bart Van Assche wrote:
> > Please keep in mind that all synchronize_rcu() does is to wait for pre-
> > existing RCU readers to finish. synchronize_rcu() does not prevent that new
> > rcu_read_lock() calls happen. It is e.g. possible that after
> 
> That is right, and I also mentioned normal completion can be done between
> 1) and reset aborted_gstate in 3).
> 
> > blk_mq_rq_update_aborted_gstate(req, 0) has been executed that a regular
> > completion occurs. If that request is not reused before the timer that was
> > restarted by the timeout code expires, that request will be completed twice.
> 
> In this patch, blk_mq_add_timer(req, MQ_RQ_COMPLETE, MQ_RQ_IN_FLIGHT) is
> called for handling BLK_EH_RESET_TIMER. And after rq's state is changed
> to MQ_RQ_IN_FLIGHT, normal completion still can come and complete this rq,
> just like the above you described, right?

I should have added the following in my previous e-mail: "if the completion
occurs after blk_mq_check_expired() examined rq->gstate and before it updated
rq->aborted_gstate". That race can occur with the current upstream blk-mq
timeout handling code but not after my patch has been applied.

Bart.