On 2023/09/28 14:26, Geert Uytterhoeven wrote: > Hi Damien, > > (oops, found a two-day old email still in draft) > > On Tue, Sep 26, 2023 at 10:15 AM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: >> The first 9 patches of this series fix several issues with suspend/resume >> power management operations in scsi and libata. The most significant >> changes introduced are in patch 4 and 5, where the manage_start_stop >> flag of scsi devices is split into the manage_system_start_stop and >> manage_runtime_start_stop flags to allow keeping scsi runtime power >> operations for spining up/down ATA devices but have libata do its own >> system suspend/resume device power state management using EH. >> >> The remaining patches are code cleanup that do not introduce any >> significant functional change. >> >> This series was tested on qemu and on various PCs and servers. I am >> CC-ing people who recently reported issues with suspend/resume. >> Additional testing would be much appreciated. > > JFTR, with current libata/for-next[*], I saw the following with > rcar-sata, once (interesting lines marked with "!"): > > PM: suspend entry (s2idle) > Filesystems sync: 0.026 seconds > Freezing user space processes > ! ata1.00: qc timeout after 10000 msecs (cmd 0x40) > Freezing user space processes completed (elapsed 0.007 seconds) > ! ata1.00: VERIFY failed (err_mask=0x4) > OOM killer disabled. > ! ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40) > Freezing remaining freezable tasks > ! ata1.00: revalidation failed (errno=-5) > Freezing remaining freezable tasks completed (elapsed 0.002 seconds) > sd 0:0:0:0: [sda] Synchronizing SCSI cache > ata1: link resume succeeded after 1 retries > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: configured for UDMA/133 > ata1.00: Entering active power mode > ata1.00: Entering standby power mode > ravb e6800000.ethernet eth0: Link is Down > Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=136) > OOM killer enabled. > Restarting tasks ... done. > random: crng reseeded on system resumption > PM: suspend exit > ata1: link resume succeeded after 1 retries > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata1.00: Entering active power mode > ata1.00: configured for UDMA/133 > ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off > > Regardless, the disk worked fine after resume. > > Note that I saw this only once. I think I found the reason for this, but to confirm, were you doing a suspend right after resuming the system ? If yes, that I think I exactly understand the issue and why you saw it only once (it is a subtle race with scheduling libata-EH suspend/resume operations). I will send a fix next week. -- Damien Le Moal Western Digital Research