On 9/10/23 21:02, Damien Le Moal wrote:
Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume") modified ata_scsi_dev_rescan() to check the scsi device "is_suspended" power field to ensure that the scsi device associated with an ATA device is fully resumed when scsi_rescan_device() is executed. However, this fix is problematic as: 1) it relies on a PM internal field that should not be used without PM device locking protection. 2) The check for is_suspended and the call to ata_scsi_dev_rescan() are not atomic and a suspend PM even may be triggered between them, casuing ata_scsi_dev_rescan() to be called on a suspended device, resulting in that function blocking while holding the scsi device lock, which would deadlock a following resume operation. These problems can trigger PM deadlocks on resume, especially with resume operations triggered quickly after or during suspend operations. E.g., a simple bash script like: for (( i=0; i<10; i++ )); do echo "+2 > /sys/class/rtc/rtc0/wakealarm echo mem > /sys/power/state done that triggers a resume 2 seconds after starting suspending a system can quickly lead to a PM deadlock preventing the system from correctly resuming. Fix this by replacing the check on is_suspended with a check on the scsi device state inside ata_scsi_dev_rescan(), while holding the scsi device lock, thus making the device rescan atomic with regard to PM operations. Additionnly, make sure that scheduled rescan tasks are first cancelled before suspending an ata port.
One patch per subsystem please. I think this patch can be split easily into an ATA patch and a SCSI core patch.
Thanks, Bart.