On 1/10/24 15:44, Kevin Locke wrote: > On Wed, 2024-01-10 at 11:18 +0900, Damien Le Moal wrote: >> On 1/9/24 02:56, Kevin Locke wrote: >>> On a ThinkPad T430 running Linux 6.7, when I attempt to delete the ATA >>> device for a hard drive in the Ultrabay slot (to hotswap/undock it[1]) >>> the process freezes in an unterruptible sleep. Specifically, if I run >>> >>> echo 1 >/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/delete >>> >>> The shell process hangs in the write(2) syscall. The last dmesg >>> entries post hang are: >>> >>> sd 1:0:0:0: [sda] Synchronizing SCSI cache >>> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >>> ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> ata2.00: configured for UDMA/133 >> >> It looks like the device was sleeping or was in standby state. >> If that is the case, then we may be deadlocking with the scsi revalidate done >> when waking up a drive. Can you confirm what the power state of the drive was >> when you ran this ? Do you see an issue if you first make sure that the drive is >> spun-up ? > > Bingo! I can confirm that I do not experience the issue when the > drive is in the active/idle state, nor in standby (hdparm -y). I only > experience the issue if the drive is in the sleep (hdparm -Y) state. hdparm -Y does not use the kernel power management core. There is a hack in libata to track that a SLEEP command was issued to the device and then mark it with ATA_DFLAG_SLEEPING. And if this flag is set, then the port is reset to spinup the drive whenever a new command is received. That is what is causing the problem here as a reset (EH running) is not supposed to happen when the scsi drive is going away. This all boils down to the scsi disk not being in sync with its underlying ata device: the scsi disk is not marked as spun-down/standby and so a flush cache is issued. Problem is that it is not realistic to track and maintain the system device state by catching/parsing passthrough commands such as issued by hdparm. That said, a hang is still not acceptable. we will see how to avoid it (it is not trivial). > > Thanks for the incisive suggestion, > Kevin > > P.S. The Ultrabay eject script[1] runs `hdparm -Y` before delete. > Should that be removed (or changed to standby) to avoid unnecessary > revalidation? For both HDD and SSD? Removing a scsi disk will remove and put the backing ata device in standby state. So you do not need to do that manually. Remove that "hdparm -Y" from your script. > > [1]: https://www.thinkwiki.org/wiki/How_to_hotswap_Ultrabay_devices#Script_for_Ultrabay_eject > -- Damien Le Moal Western Digital Research