On Mon, Sep 11, 2023 at 01:02:03PM +0900, Damien Le Moal wrote: > Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after > device resume") modified ata_scsi_dev_rescan() to check the scsi device > "is_suspended" power field to ensure that the scsi device associated > with an ATA device is fully resumed when scsi_rescan_device() is > executed. However, this fix is problematic as: > 1) it relies on a PM internal field that should not be used without PM > device locking protection. > 2) The check for is_suspended and the call to ata_scsi_dev_rescan() are > not atomic and a suspend PM even may be triggered between them, > casuing ata_scsi_dev_rescan() to be called on a suspended device, > resulting in that function blocking while holding the scsi device > lock, which would deadlock a following resume operation. > These problems can trigger PM deadlocks on resume, especially with > resume operations triggered quickly after or during suspend operations. > E.g., a simple bash script like: > > for (( i=0; i<10; i++ )); do > echo "+2 > /sys/class/rtc/rtc0/wakealarm > echo mem > /sys/power/state > done > > that triggers a resume 2 seconds after starting suspending a system can > quickly lead to a PM deadlock preventing the system from correctly > resuming. > > Fix this by replacing the check on is_suspended with a check on the scsi > device state inside ata_scsi_dev_rescan(), while holding the scsi device > lock, thus making the device rescan atomic with regard to PM operations. > Additionnly, make sure that scheduled rescan tasks are first cancelled > before suspending an ata port. > > Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx> Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@xxxxxxxxxxxxx>