Re: [PATCH v3 07/23] ata: libata-scsi: Fix delayed scsi_rescan_device() execution

On 9/15/23 10:14, Damien Le Moal wrote:
Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) It relies on a PM internal field that should not be used without PM
    device locking protection.
2) The check for is_suspended and the call to scsi_rescan_device() are
    not atomic and a suspend PM event may be triggered between them,
    casuing scsi_rescan_device() to be called on a suspended device and
    in that function blocking while holding the scsi device lock. This
    would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly

Fix this by replacing the check on is_suspended with a check on the
return value given by scsi_rescan_device() as that function will fail if
called against a suspended device. Also make sure rescan tasks already
scheduled are first cancelled before suspending an ata port.

Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
  drivers/ata/libata-core.c | 16 ++++++++++++++++
  drivers/ata/libata-scsi.c | 33 +++++++++++++++------------------
  2 files changed, 31 insertions(+), 18 deletions(-)

Reviewed-by: Hannes Reinecke <hare@xxxxxxx>


