On Wed, Jan 30, 2019 at 08:54:09AM -0700, Jens Axboe wrote: > On 1/30/19 2:01 AM, Jianchao Wang wrote: > > Florian reported a io hung issue when fsync(). It should be > > triggered by following race condition. > > > > data + post flush a flush > > > > blk_flush_complete_seq > > case REQ_FSEQ_DATA > > blk_flush_queue_rq > > issued to driver blk_mq_dispatch_rq_list > > try to issue a flush req > > failed due to NON-NCQ command > > .queue_rq return BLK_STS_DEV_RESOURCE > > > > request completion > > req->end_io // doesn't check RESTART > > mq_flush_data_end_io > > case REQ_FSEQ_POSTFLUSH > > blk_kick_flush > > do nothing because previous flush > > has not been completed > > blk_mq_run_hw_queue > > insert rq to hctx->dispatch > > due to RESTART is still set, do nothing > > > > To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io > > with blk_mq_sched_restart to check and clear the RESTART flag. > > Applied, thanks. > > -- > Jens Axboe Can this be applied to stable kernels please? It's commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce upstream. Thanks, -- Thibaut