Re: [PATCH 4/4] block: fix fix ordering between checking QUEUE_FLAG_QUIESCED and adding requests to hctx->dispatch

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 23 Aug 2024 19:27:57 +0800

On Sun, Aug 11, 2024 at 06:19:21PM +0800, Muchun Song wrote:
> Supposing the following scenario.
> 
> CPU0                                                                CPU1
> 
> blk_mq_request_issue_directly()                                     blk_mq_unquiesce_queue()
>     if (blk_queue_quiesced())                                           blk_queue_flag_clear(QUEUE_FLAG_QUIESCED)   3) store
>         blk_mq_insert_request()                                         blk_mq_run_hw_queues()
>             /*                                                              blk_mq_run_hw_queue()
>              * Add request to dispatch list or set bitmap of                    if (!blk_mq_hctx_has_pending())     4) load
>              * software queue.                  1) store                            return
>              */
>         blk_mq_run_hw_queue()
>             if (blk_queue_quiesced())           2) load
>                 return
>             blk_mq_sched_dispatch_requests()
> 
> The full memory barrier should be inserted between 1) and 2), as well as
> between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is
> cleared or CPU1 sees dispatch list or setting of bitmap of software queue.
> Otherwise, either CPU will not re-run the hardware queue causing starvation.

Memory barrier shouldn't serve as bug fix for two slow code paths.

One simple fix is to add helper of blk_queue_quiesced_lock(), and
call the following check on CPU0:

	if (blk_queue_quiesced_lock())
         blk_mq_run_hw_queue();

thanks,
Ming