Re: [PATCH v7 00/23] Fix libata suspend/resume handling and code cleanup

Damien Le Moal <dlemoal@xxxxxxxxxx> · Fri, 29 Sep 2023 15:37:24 +0200

On 2023/09/28 14:26, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> (oops, found a two-day old email still in draft)
> 
> On Tue, Sep 26, 2023 at 10:15 AM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote:
>> The first 9 patches of this series fix several issues with suspend/resume
>> power management operations in scsi and libata. The most significant
>> changes introduced are in patch 4 and 5, where the manage_start_stop
>> flag of scsi devices is split into the manage_system_start_stop and
>> manage_runtime_start_stop flags to allow keeping scsi runtime power
>> operations for spining up/down ATA devices but have libata do its own
>> system suspend/resume device power state management using EH.
>>
>> The remaining patches are code cleanup that do not introduce any
>> significant functional change.
>>
>> This series was tested on qemu and on various PCs and servers. I am
>> CC-ing people who recently reported issues with suspend/resume.
>> Additional testing would be much appreciated.
> 
> JFTR, with current libata/for-next[*], I saw the following with
> rcar-sata, once (interesting lines marked with "!"):
> 
>     PM: suspend entry (s2idle)
>     Filesystems sync: 0.026 seconds
>     Freezing user space processes
>  !  ata1.00: qc timeout after 10000 msecs (cmd 0x40)
>     Freezing user space processes completed (elapsed 0.007 seconds)
>  !  ata1.00: VERIFY failed (err_mask=0x4)
>     OOM killer disabled.
>  !  ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
>     Freezing remaining freezable tasks
>  !  ata1.00: revalidation failed (errno=-5)
>     Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
>     sd 0:0:0:0: [sda] Synchronizing SCSI cache
>     ata1: link resume succeeded after 1 retries
>     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     ata1.00: configured for UDMA/133
>     ata1.00: Entering active power mode
>     ata1.00: Entering standby power mode
>     ravb e6800000.ethernet eth0: Link is Down
>     Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached
> PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=136)
>     OOM killer enabled.
>     Restarting tasks ... done.
>     random: crng reseeded on system resumption
>     PM: suspend exit
>     ata1: link resume succeeded after 1 retries
>     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     ata1.00: Entering active power mode
>     ata1.00: configured for UDMA/133
>     ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> 
> Regardless, the disk worked fine after resume.
> 
> Note that I saw this only once.

I think I found the reason for this, but to confirm, were you doing a suspend
right after resuming the system ? If yes, that I think I exactly understand the
issue and why you saw it only once (it is a subtle race with scheduling
libata-EH suspend/resume operations). I will send a fix next week.

-- 
Damien Le Moal
Western Digital Research