Re: [PATCH 03/15] scsi/block PM: Always set request queue runtime active in blk_post_runtime_resume()

Bart Van Assche <bvanassche@xxxxxxx> · Tue, 16 Nov 2021 20:58:58 -0800

On 11/16/21 6:44 PM, chenxiang wrote:
From: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>

John Garry reported a deadlock that occurs when trying to access a
runtime-suspended SATA device.  For obscure reasons, the rescan
procedure causes the link to be hard-reset, which disconnects the
device.

The rescan tries to carry out a runtime resume when accessing the
device.  scsi_rescan_device() holds the SCSI device lock and won't
release it until it can put commands onto the device's block queue.
This can't happen until the queue is successfully runtime-resumed or
the device is unregistered.  But the runtime resume fails because the
device is disconnected, and __scsi_remove_device() can't do the
unregistration because it can't get the device lock.

The best way to resolve this deadlock appears to be to allow the block
queue to start running again even after an unsuccessful runtime
resume.  The idea is that the driver or the SCSI error handler will
need to be able to use the queue to resolve the runtime resume
failure.

This patch removes the err argument to blk_post_runtime_resume() and
makes the routine act as though the resume was successful always.
This fixes the deadlock.

Reviewed-by: Bart Van Assche <bvanassche@xxxxxxx>