Re: [blktests] zbd/012: Test requeuing of zoned writes and queue freezing

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Wed, 27 Nov 2024 00:58:16 -0800

On Wed, Nov 27, 2024 at 05:17:08PM +0900, Damien Le Moal wrote:
> After all these fixes, the last remaining problem is the zone write
> plug error recovery issuing a report zone which can block if a queue 
> freeze was initiated.
>
> That can prevent forward progress and hang the freeze caller. I do not
> see any way to avoid that report zones. I think this could be fixed with
> a magic BLK_MQ_REQ_INTERNAL flag passed to blk_mq_alloc_request() and
> propagated to blk_queue_enter() to forcefully take a queue usage counter
> reference even if a queue freeze was started. That would ensure forward
> progress (i.e. scsi_execute_cmd() or the NVMe equivalent would not block
> forever). Need to think more about that.

You are talking about disk_zone_wplug_handle_error here, right?

We should not issue a report zones to a frozen queue, as that would
bypass the freezing protection.  I suspect the right thing is to
simply defer the error recovery action until after the queue is
unfrozen.

I wonder if the separate error work handler should go away, instead
blk_zone_wplug_bio_work should always check for an error first
and in that case do the report zones.  And blk_zone_wplug_handle_write
would always defer to the work queue if there was an error.