Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Fri, 13 Jul 2018 15:52:38 +0000

On Fri, 2018-07-13 at 09:43 -0600, Keith Busch wrote:
> On Thu, Jul 12, 2018 at 10:24:42PM +0000, Bart Van Assche wrote:
> > Before commit 12f5b9314545 ("blk-mq: Remove generation seqeunce"), if a
> > request completion was reported after request timeout processing had
> > started, completion handling was skipped. The following code in
> > blk_mq_complete_request() realized that:
> > 
> > 	if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
> > 		__blk_mq_complete_request(rq);
> > 
> > Since commit 12f5b9314545, if a completion occurs after request timeout
> > processing has started, that completion is processed if the request has the
> > state MQ_RQ_IN_FLIGHT. blk_mq_rq_timed_out() does not modify the request
> > state unless the block driver timeout handler modifies it, e.g. by calling
> > blk_mq_end_request() or by calling blk_mq_requeue_request(). The typical
> > behavior of scsi_times_out() is to queue sending of a SCSI abort and hence
> > not to change the request state immediately. In other words, if a request
> > completion occurs during or shortly after a timeout occurred then
> > blk_mq_complete_request() will call __blk_mq_complete_request() and will
> > complete the request, although that is not allowed because timeout handling
> > has already started. Do you agree with this analysis?
> 
> Yes, it's different, and that was the whole point. No one made that a
> secret either. Are you saying you want timeout software to take priority
> over handling hardware events?

No. What I'm saying is that a behavior change has been introduced in the block
layer that was not documented in the patch description. I think that behavior
change even can trigger a kernel crash.

Bart.