On Wed, 2018-05-16 at 18:24 +0200, hch@xxxxxx wrote: > On Wed, May 16, 2018 at 04:17:42PM +0000, Bart Van Assche wrote: > > There is another reason the deadline is included in the atomic operation, > > namely to handle races between the BLK_EH_RESET_TIMER case in blk_mq_rq_timed_out() > > and blk_mq_complete_request(). I don't think that race is addressed properly by > > your patch. I will see what I can do to address that race without using 64-bit > > atomic operations. > > I might be missing something here, so please help me understand > what is missing. > > If we restart the timer in blk_mq_rq_timed_out we also bump the > generation at the same time as we reset the deadline in your old > patch. With this patch we only bump the generation, but that should > be enough to address the rest, or not? Hello Christoph, I think your patch changes the order of changing the request state and calling mod_timer(). In my patch the request state and the deadline are updated first and mod_timer() is called afterwards. I think your patch changes the order of these operations into the following: (1) __blk_mq_start_request() sets the request deadline. (2) __blk_mq_start_request() calls __blk_add_timer() which in turn calls mod_timer(). (3) __blk_mq_start_request() changes the request state into MQ_RQ_IN_FLIGHT. In the unlikely event of a significant delay between (2) and (3) it can happen that the timer fires and examines and ignores the request because its state differs from MQ_RQ_IN_FLIGHT. If the request for which this happened times out its timeout will only be handled the next time blk_mq_timeout_work() is called. Is this the behavior you intended? Thanks, Bart.