Re: [Regression] Hang deleting ATA HDD device for undocking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/10/24 15:44, Kevin Locke wrote:
> On Wed, 2024-01-10 at 11:18 +0900, Damien Le Moal wrote:
>> On 1/9/24 02:56, Kevin Locke wrote:
>>> On a ThinkPad T430 running Linux 6.7, when I attempt to delete the ATA
>>> device for a hard drive in the Ultrabay slot (to hotswap/undock it[1])
>>> the process freezes in an unterruptible sleep.  Specifically, if I run
>>>
>>>     echo 1 >/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/delete
>>>
>>> The shell process hangs in the write(2) syscall.  The last dmesg
>>> entries post hang are:
>>>
>>>     sd 1:0:0:0: [sda] Synchronizing SCSI cache
>>>     ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>     ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>>     ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>>     ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>>     ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>>     ata2.00: configured for UDMA/133
>>
>> It looks like the device was sleeping or was in standby state.
>> If that is the case, then we may be deadlocking with the scsi revalidate done
>> when waking up a drive. Can you confirm what the power state of the drive was
>> when you ran this ? Do you see an issue if you first make sure that the drive is
>> spun-up ?
> 
> Bingo!  I can confirm that I do not experience the issue when the
> drive is in the active/idle state, nor in standby (hdparm -y).  I only
> experience the issue if the drive is in the sleep (hdparm -Y) state.

hdparm -Y does not use the kernel power management core. There is a hack in
libata to track that a SLEEP command was issued to the device and then mark it
with ATA_DFLAG_SLEEPING. And if this flag is set, then the port is reset to
spinup the drive whenever a new command is received. That is what is causing the
problem here as a reset (EH running) is not supposed to happen when the scsi
drive is going away.

This all boils down to the scsi disk not being in sync with its underlying ata
device: the scsi disk is not marked as spun-down/standby and so a flush cache is
issued. Problem is that it is not realistic to track and maintain the system
device state by catching/parsing passthrough commands such as issued by hdparm.

That said, a hang is still not acceptable. we will see how to avoid it (it is
not trivial).

> 
> Thanks for the incisive suggestion,
> Kevin
> 
> P.S. The Ultrabay eject script[1] runs `hdparm -Y` before delete.
> Should that be removed (or changed to standby) to avoid unnecessary
> revalidation?  For both HDD and SSD?

Removing a scsi disk will remove and put the backing ata device in standby
state. So you do not need to do that manually. Remove that "hdparm -Y" from your
script.

> 
> [1]: https://www.thinkwiki.org/wiki/How_to_hotswap_Ultrabay_devices#Script_for_Ultrabay_eject
> 

-- 
Damien Le Moal
Western Digital Research





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux