Re: [PATCH v16 05/26] blk-zoned: Fix a deadlock triggered by unaligned writes

Damien Le Moal <dlemoal@xxxxxxxxxx> · Thu, 21 Nov 2024 12:32:11 +0900

On 11/20/24 06:04, Bart Van Assche wrote:
> On 11/18/24 6:57 PM, Damien Le Moal wrote:
>> On 11/19/24 9:27 AM, Bart Van Assche wrote:
>>> If the queue is filled with unaligned writes then the following
>>> deadlock occurs:
>>>
>>> Call Trace:
>>>   <TASK>
>>>   __schedule+0x8cc/0x2190
>>>   schedule+0xdd/0x2b0
>>>   blk_queue_enter+0x2ce/0x4f0
>>>   blk_mq_alloc_request+0x303/0x810
>>>   scsi_execute_cmd+0x3f4/0x7b0
>>>   sd_zbc_do_report_zones+0x19e/0x4c0
>>>   sd_zbc_report_zones+0x304/0x920
>>>   disk_zone_wplug_handle_error+0x237/0x920
>>>   disk_zone_wplugs_work+0x17e/0x430
>>>   process_one_work+0xdd0/0x1490
>>>   worker_thread+0x5eb/0x1010
>>>   kthread+0x2e5/0x3b0
>>>   ret_from_fork+0x3a/0x80
>>>   </TASK>
>>>
>>> Fix this deadlock by removing the disk->fops->report_zones() call and by
>>> deriving the write pointer information from successfully completed zoned
>>> writes.
>>>
>>> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
>>
>> Doesn't this need a Fixes tag and CC stable, and come earlier in the series (or
>> sent separately) ?
> 
> I will add Fixes: and Cc: stable tags.
> 
> I'm not sure how to move this patch earlier since it depends on the
> previous patch in this series ("blk-zoned: Only handle errors after
> pending zoned writes have completed"). Without that patch, it is not
> safe to use zwplug->wp_offset_compl in the error handler.
> 
>> Overall, this patch seems wrong anyway as zone reset and zone finish may be
>> done between 2 writes, failing the next one but the recovery done here will use
>> the previous succeful write end position as the wp, which is NOT correct as
>> reset or finish changed that...
> 
> I will add support for the zone reset and zone finish commands in this
> patch.
> 
>> And we also have the possibility of torn writes
>> (partial writes) with SAS SMR drives. So I really think that you cannot avoid
>> doing a report zone to recover errors.
> 
> Thanks for having brought this up. This is something I was not aware of.
> 
> disk_zone_wplug_handle_error() submits a new request to retrieve zone 
> information while handling an error triggered by other requests. This
> can easily lead to a deadlock as the above call trace shows. How about
> introducing a queue flag for the "report zones" approach in
> disk_zone_wplug_handle_error() such that the "report zones" approach is
> only used for SAS SMR drives?

Sure, but how would that solve the potential deadlock problem ? ALso, I am not
entirely clear on how the deadlock can happen given that zone write plugs are
queueing/blocking BIOs, not requests. So even assuming you have a large number
of BIOs plugged in a zone write plug, the error handler work should still be
able to issue a request to do a report zones, no ? On which resource can the
deadlock happen ? Plugged BIOs do not yet use a tag, right ?

What am I missing here ? Or is it maybe something that can happen with your
modifications because you changed the zone write plug behavior to allow for more
than one BIO at a time being unplugged and issued to the device ?

Note that if you do have a test case for this triggering the deadlock, we
definitely need to solve this and ideally have a blktest case checking it.

-- 
Damien Le Moal
Western Digital Research