Re: [PATCH v16 05/26] blk-zoned: Fix a deadlock triggered by unaligned writes

Bart Van Assche <bvanassche@xxxxxxx> · Tue, 19 Nov 2024 13:04:50 -0800

On 11/18/24 6:57 PM, Damien Le Moal wrote:
On 11/19/24 9:27 AM, Bart Van Assche wrote:
If the queue is filled with unaligned writes then the following
deadlock occurs:

Call Trace:
  <TASK>
  __schedule+0x8cc/0x2190
  schedule+0xdd/0x2b0
  blk_queue_enter+0x2ce/0x4f0
  blk_mq_alloc_request+0x303/0x810
  scsi_execute_cmd+0x3f4/0x7b0
  sd_zbc_do_report_zones+0x19e/0x4c0
  sd_zbc_report_zones+0x304/0x920
  disk_zone_wplug_handle_error+0x237/0x920
  disk_zone_wplugs_work+0x17e/0x430
  process_one_work+0xdd0/0x1490
  worker_thread+0x5eb/0x1010
  kthread+0x2e5/0x3b0
  ret_from_fork+0x3a/0x80
  </TASK>

Fix this deadlock by removing the disk->fops->report_zones() call and by
deriving the write pointer information from successfully completed zoned
writes.

Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>

Doesn't this need a Fixes tag and CC stable, and come earlier in the series (or
sent separately) ?

I will add Fixes: and Cc: stable tags.

I'm not sure how to move this patch earlier since it depends on the
previous patch in this series ("blk-zoned: Only handle errors after
pending zoned writes have completed"). Without that patch, it is not
safe to use zwplug->wp_offset_compl in the error handler.

Overall, this patch seems wrong anyway as zone reset and zone finish may be
done between 2 writes, failing the next one but the recovery done here will use
the previous succeful write end position as the wp, which is NOT correct as
reset or finish changed that...

I will add support for the zone reset and zone finish commands in this
patch.

And we also have the possibility of torn writes
(partial writes) with SAS SMR drives. So I really think that you cannot avoid
doing a report zone to recover errors.

Thanks for having brought this up. This is something I was not aware of.

disk_zone_wplug_handle_error() submits a new request to retrieve zone 
information while handling an error triggered by other requests. This
can easily lead to a deadlock as the above call trace shows. How about
introducing a queue flag for the "report zones" approach in
disk_zone_wplug_handle_error() such that the "report zones" approach is
only used for SAS SMR drives?

Thanks,

Bart.