Re: [PATCH] block: mq-deadline: Fix queue restart handling

Jens Axboe <axboe@xxxxxxxxx> · Tue, 3 Sep 2019 08:00:03 -0600

On 8/27/19 10:40 PM, Damien Le Moal wrote:
> Commit 7211aef86f79 ("block: mq-deadline: Fix write completion
> handling") added a call to blk_mq_sched_mark_restart_hctx() in
> dd_dispatch_request() to make sure that write request dispatching does
> not stall when all target zones are locked. This fix left a subtle race
> when a write completion happens during a dispatch execution on another
> CPU:
> 
> CPU 0: Dispatch			CPU1: write completion
> 
> dd_dispatch_request()
>      lock(&dd->lock);
>      ...
>      lock(&dd->zone_lock);	dd_finish_request()
>      rq = find request		lock(&dd->zone_lock);
>      unlock(&dd->zone_lock);
>      				zone write unlock
> 				unlock(&dd->zone_lock);
> 				...
> 				__blk_mq_free_request
>                                        check restart flag (not set)
> 				      -> queue not run
>      ...
>      if (!rq && have writes)
>          blk_mq_sched_mark_restart_hctx()
>      unlock(&dd->lock)
> 
> Since the dispatch context finishes after the write request completion
> handling, marking the queue as needing a restart is not seen from
> __blk_mq_free_request() and blk_mq_sched_restart() not executed leading
> to the dispatch stall under 100% write workloads.
> 
> Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
> dd_dispatch_request() into dd_finish_request() under the zone lock to
> ensure full mutual exclusion between write request dispatch selection
> and zone unlock on write request completion.

Applied, thanks.

-- 
Jens Axboe