Re: [PATCHv4 0/3] scsi timeout handling updates

Keith Busch <keith.busch@xxxxxxxxx> · Thu, 29 Nov 2018 10:20:34 -0700

On Thu, Nov 29, 2018 at 06:11:59PM +0100, Christoph Hellwig wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index a82830f39933..d0ef540711c7 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
> >  
> >  int blk_mq_request_started(struct request *rq)
> >  {
> > -	return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
> > +	return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT;
> >  }
> >  EXPORT_SYMBOL_GPL(blk_mq_request_started);
> 
> Independ of this series this change looks like the right thing to do.
> But this whole area is a mine field, so separate testing would be
> very helpful.
>
> I also wonder why we even bother with the above helper, a direct
> state comparism seems a lot more obvious to the reader.

I think it's just because blk_mq_rq_state() is a private interface. The
enum mq_rq_state is defined under include/linux/, so it looks okay to
make getting the state public too.

> Last but not least the blk_mq_request_started check in nbd
> should probably be lifted into blk_mq_tag_to_rq while we're at it..
> 
> As for the nvme issue - it seems to me like we need to decouple the
> nvme loop frontend request from the target backend request.  In case
> of a timeout/reset we'll complete struct request like all other nvme
> transport drivers, but we leave the backend target state around, which
> will be freed when it completes (or leaks when the completion is lost).

I don't think nvme's loop target should do anything to help a command
complete. It shouldn't even implement a timeout for the same reason
no stacking block driver implements these. If a request is stuck, the
lowest level is the only driver that should have the responsibility to
make it unstuck.