Re: [PATCH] blk-mq: fix a hung issue when fsync

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 17 Feb 2019 12:28:19 -0500

On Sun, Feb 17, 2019 at 04:37:29PM +0100, Thibaut Sautereau wrote:
On Wed, Jan 30, 2019 at 08:54:09AM -0700, Jens Axboe wrote:
On 1/30/19 2:01 AM, Jianchao Wang wrote:
> Florian reported a io hung issue when fsync(). It should be
> triggered by following race condition.
>
> data + post flush         a flush
>
> blk_flush_complete_seq
>   case REQ_FSEQ_DATA
>     blk_flush_queue_rq
>     issued to driver      blk_mq_dispatch_rq_list
>                             try to issue a flush req
>                             failed due to NON-NCQ command
>                             .queue_rq return BLK_STS_DEV_RESOURCE
>
> request completion
>   req->end_io // doesn't check RESTART
>   mq_flush_data_end_io
>     case REQ_FSEQ_POSTFLUSH
>       blk_kick_flush
>         do nothing because previous flush
>         has not been completed
>      blk_mq_run_hw_queue
>                               insert rq to hctx->dispatch
>                               due to RESTART is still set, do nothing
>
> To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
> with blk_mq_sched_restart to check and clear the RESTART flag.

Applied, thanks.

--
Jens Axboe

Can this be applied to stable kernels please?

It's commit 85bd6e61f34dffa8ec2dc75ff3c02ee7b2f1cbce upstream.

I've queued it for 4.20, 4.19 and 4.14.

--
Thanks,
Sasha