Re: [PATCH] blk-mq: fix a hung issue when fsync

Thibaut Sautereau <thibaut@xxxxxxxxxxxx> · Sun, 17 Feb 2019 16:37:29 +0100

On Wed, Jan 30, 2019 at 08:54:09AM -0700, Jens Axboe wrote:
> On 1/30/19 2:01 AM, Jianchao Wang wrote:
> > Florian reported a io hung issue when fsync(). It should be
> > triggered by following race condition.
> > 
> > data + post flush         a flush
> > 
> > blk_flush_complete_seq
> >   case REQ_FSEQ_DATA
> >     blk_flush_queue_rq
> >     issued to driver      blk_mq_dispatch_rq_list
> >                             try to issue a flush req
> >                             failed due to NON-NCQ command
> >                             .queue_rq return BLK_STS_DEV_RESOURCE
> > 
> > request completion
> >   req->end_io // doesn't check RESTART
> >   mq_flush_data_end_io
> >     case REQ_FSEQ_POSTFLUSH
> >       blk_kick_flush
> >         do nothing because previous flush
> >         has not been completed
> >      blk_mq_run_hw_queue
> >                               insert rq to hctx->dispatch
> >                               due to RESTART is still set, do nothing
> > 
> > To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
> > with blk_mq_sched_restart to check and clear the RESTART flag.
> 
> Applied, thanks.
> 
> -- 
> Jens Axboe

Can this be applied to stable kernels please?

It's commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce upstream.

Thanks,

-- 
Thibaut