On 8/13/23 7:36 AM, chengming.zhou@xxxxxxxxx wrote: > From: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> > > Chuck reported [1] a IO hang problem on NFS exports that reside on SATA > devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal > submission path for post-flush requests"). > > We analysed the IO hang problem, found there are two postflush requests > are waiting for each other. > > The first postflush request completed the REQ_FSEQ_DATA sequence, so go to > the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but > failed to blk_kick_flush() because of the second postflush request which > is inflight waiting in scheduler queue. > > The second postflush waiting in scheduler queue can't be dispatched because > the first postflush hasn't released scheduler resource even though it has > completed by itself. > > Fix it by releasing scheduler resource when the first postflush request > completed, so the second postflush can be dispatched and completed, then > make blk_kick_flush() succeed. Change looks good to me. But since we're in here: > diff --git a/block/blk-mq.c b/block/blk-mq.c > index f14b8669ac69..5b14f18f9670 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -682,6 +682,15 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > } > EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx); > > +static void blk_mq_finish_request(struct request *rq) > +{ > + struct request_queue *q = rq->q; > + > + if ((rq->rq_flags & RQF_USE_SCHED) && > + q->elevator->type->ops.finish_request) > + q->elevator->type->ops.finish_request(rq); > +} Any IO scheduler should set ->finish_request(), so this should just be: static void blk_mq_finish_request(struct request *rq) { struct request_queue *q = rq->q; if (rq->rq_flags & RQF_USE_SCHED) q->elevator->type->ops.finish_request(rq); } and would probably be a good idea to solidify that with a: if (WARN_ON_ONCE(!e->ops.finish_request)) return -EINVAL; at the top of elv_register() like we have for insert/dispatch as well. All 3 IO schedulers do set ->finish_request(). -- Jens Axboe