Hi Damien, On Fri, Sep 29, 2023 at 3:37 PM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: > On 2023/09/28 14:26, Geert Uytterhoeven wrote: > > On Tue, Sep 26, 2023 at 10:15 AM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: > >> The first 9 patches of this series fix several issues with suspend/resume > >> power management operations in scsi and libata. The most significant > >> changes introduced are in patch 4 and 5, where the manage_start_stop > >> flag of scsi devices is split into the manage_system_start_stop and > >> manage_runtime_start_stop flags to allow keeping scsi runtime power > >> operations for spining up/down ATA devices but have libata do its own > >> system suspend/resume device power state management using EH. > >> > >> The remaining patches are code cleanup that do not introduce any > >> significant functional change. > >> > >> This series was tested on qemu and on various PCs and servers. I am > >> CC-ing people who recently reported issues with suspend/resume. > >> Additional testing would be much appreciated. > > > > JFTR, with current libata/for-next[*], I saw the following with > > rcar-sata, once (interesting lines marked with "!"): > > > > PM: suspend entry (s2idle) > > Filesystems sync: 0.026 seconds > > Freezing user space processes > > ! ata1.00: qc timeout after 10000 msecs (cmd 0x40) > > Freezing user space processes completed (elapsed 0.007 seconds) > > ! ata1.00: VERIFY failed (err_mask=0x4) > > OOM killer disabled. > > ! ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40) > > Freezing remaining freezable tasks > > ! ata1.00: revalidation failed (errno=-5) > > Freezing remaining freezable tasks completed (elapsed 0.002 seconds) > > sd 0:0:0:0: [sda] Synchronizing SCSI cache > > ata1: link resume succeeded after 1 retries > > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > ata1.00: configured for UDMA/133 > > ata1.00: Entering active power mode > > ata1.00: Entering standby power mode > > ravb e6800000.ethernet eth0: Link is Down > > Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached > > PHY driver (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=136) > > OOM killer enabled. > > Restarting tasks ... done. > > random: crng reseeded on system resumption > > PM: suspend exit > > ata1: link resume succeeded after 1 retries > > ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > ata1.00: Entering active power mode > > ata1.00: configured for UDMA/133 > > ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off > > > > Regardless, the disk worked fine after resume. > > > > Note that I saw this only once. > > I think I found the reason for this, but to confirm, were you doing a suspend > right after resuming the system ? If yes, that I think I exactly understand the > issue and why you saw it only once (it is a subtle race with scheduling > libata-EH suspend/resume operations). I will send a fix next week. Now you ask that, yes there was a system suspend before. Relevant log with timing info: [ 130.177616] PM: suspend exit [ 130.257981] ata1: link resume succeeded after 1 retries [ 130.376714] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 130.388525] ata1.00: Entering active power mode so the drive should have been ready here. [ 140.452669] PM: suspend entry (s2idle) [ 140.488313] Filesystems sync: 0.026 seconds [ 140.515957] Freezing user space processes [ 140.518209] ata1.00: qc timeout after 10000 msecs (cmd 0x40) [ 140.523384] Freezing user space processes completed (elapsed 0.007 seconds) [ 140.527718] ata1.00: VERIFY failed (err_mask=0x4) [ 140.532541] OOM killer disabled. [ 140.537270] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 140.542069] Freezing remaining freezable tasks [ 140.546784] ata1.00: revalidation failed (errno=-5) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds