Hi Damien, On Thu, Sep 14, 2023 at 12:03 AM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: > On 9/13/23 19:21, Geert Uytterhoeven wrote: > >> Thanks for the report. The delay for the first data access from user space right > >> after resume is 100% expected, with or without this patch. The reason is that > >> waking up the drive (spinning it up) is done asynchronously from the PM resume > >> context, so when you get "PM suspend exit" message signaling that the system is > >> resumed, the drive may not yet be spinning. Any access will wait for that to > >> happen before proceeding. Depending on the drive that can take up to 10s or so. > > > > That does not match with what I am seeing: before this patch, there > > was no delay on first data access from user space, as the drive is fully > > spun up when system resume returns. > > Yes, that is a possibility, but not by design. Some users have complained about > the long resume times which causes laptop screens to be "black" until disks spin > up. That did not happen before 5.16, when the change to scsi using the PM async > ops to do suspend/resume created all the issues with suspend/resume on ata side. > I am going to run 5.15 again to check. > > The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove > this delay. But it is a bad one because the ATA specs define that a drive can > get out of standby (spun down) power state only with a media access command (a > VERIFY command is used to spin up the disk). And furthermore, the specs also > says that even a reset shall not change the device power state. So issuing a > VERIFY command to spin up the drive is required per specs. Note that I do see > many of my drives (I have hundreds in the lab) spinning up on reset, which is > against the specs. But not all of them. So with the patch "do not issue START > STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout > errors on IDENTIFY command issued on resume will get the drive removed). > > > With this patch, system resume returns earlier, and the drive is only > > spun up when user space starts accessing data. > > Yes, but "when user space starts accessing data" -> "user space accesses are > processed only after the drive completes spinning up" would be more accurate. Sure, I wrote it down in terms of what the user is experiencing, which may not be identical to what's happening under the hood. > That is by design and expected. This is the behavior one would see if the drive > is set to use standby timers (to go to standby on its own after some idle time) > or if runtime suspend is used. I do not see any issue with this behavior. Is > this causing any issue on your end ? Do you have concerns about this approach ? > > Having the resume process wait for the drive to fully spin-up is easy to do. But > as I mentioned, many users are really not happy about how slow resuming become > with that. If I do that, you will get back the previous behavior you mention, > but I will be again getting emails about "resume is broken". > > I made a decision: no waiting for spinup. That causes a delay for the user on > first access, but that makes resume overall far faster. I do not want to go back > on that, unless you can confirm that there is a real regression/error/data > corruption happening. I agree that is fine. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds