On 8/27/19 10:40 PM, Damien Le Moal wrote: > Commit 7211aef86f79 ("block: mq-deadline: Fix write completion > handling") added a call to blk_mq_sched_mark_restart_hctx() in > dd_dispatch_request() to make sure that write request dispatching does > not stall when all target zones are locked. This fix left a subtle race > when a write completion happens during a dispatch execution on another > CPU: > > CPU 0: Dispatch CPU1: write completion > > dd_dispatch_request() > lock(&dd->lock); > ... > lock(&dd->zone_lock); dd_finish_request() > rq = find request lock(&dd->zone_lock); > unlock(&dd->zone_lock); > zone write unlock > unlock(&dd->zone_lock); > ... > __blk_mq_free_request > check restart flag (not set) > -> queue not run > ... > if (!rq && have writes) > blk_mq_sched_mark_restart_hctx() > unlock(&dd->lock) > > Since the dispatch context finishes after the write request completion > handling, marking the queue as needing a restart is not seen from > __blk_mq_free_request() and blk_mq_sched_restart() not executed leading > to the dispatch stall under 100% write workloads. > > Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from > dd_dispatch_request() into dd_finish_request() under the zone lock to > ensure full mutual exclusion between write request dispatch selection > and zone unlock on write request completion. Applied, thanks. -- Jens Axboe