On 9/14/23 15:53, Geert Uytterhoeven wrote: > Hi Damien, > > On Thu, Sep 14, 2023 at 12:03 AM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: >> On 9/13/23 19:21, Geert Uytterhoeven wrote: >>>> Thanks for the report. The delay for the first data access from user space right >>>> after resume is 100% expected, with or without this patch. The reason is that >>>> waking up the drive (spinning it up) is done asynchronously from the PM resume >>>> context, so when you get "PM suspend exit" message signaling that the system is >>>> resumed, the drive may not yet be spinning. Any access will wait for that to >>>> happen before proceeding. Depending on the drive that can take up to 10s or so. >>> >>> That does not match with what I am seeing: before this patch, there >>> was no delay on first data access from user space, as the drive is fully >>> spun up when system resume returns. >> >> Yes, that is a possibility, but not by design. Some users have complained about >> the long resume times which causes laptop screens to be "black" until disks spin >> up. That did not happen before 5.16, when the change to scsi using the PM async >> ops to do suspend/resume created all the issues with suspend/resume on ata side. >> I am going to run 5.15 again to check. >> >> The patch "do not issue START STOP UNIT on resume" was a botch attempt to remove >> this delay. But it is a bad one because the ATA specs define that a drive can >> get out of standby (spun down) power state only with a media access command (a >> VERIFY command is used to spin up the disk). And furthermore, the specs also >> says that even a reset shall not change the device power state. So issuing a >> VERIFY command to spin up the drive is required per specs. Note that I do see >> many of my drives (I have hundreds in the lab) spinning up on reset, which is >> against the specs. But not all of them. So with the patch "do not issue START >> STOP UNIT on resume", one risks not seeing the drive resuming correctly (timeout >> errors on IDENTIFY command issued on resume will get the drive removed). >> >>> With this patch, system resume returns earlier, and the drive is only >>> spun up when user space starts accessing data. >> >> Yes, but "when user space starts accessing data" -> "user space accesses are >> processed only after the drive completes spinning up" would be more accurate. > > Sure, I wrote it down in terms of what the user is experiencing, which may > not be identical to what's happening under the hood. > >> That is by design and expected. This is the behavior one would see if the drive >> is set to use standby timers (to go to standby on its own after some idle time) >> or if runtime suspend is used. I do not see any issue with this behavior. Is >> this causing any issue on your end ? Do you have concerns about this approach ? >> >> Having the resume process wait for the drive to fully spin-up is easy to do. But >> as I mentioned, many users are really not happy about how slow resuming become >> with that. If I do that, you will get back the previous behavior you mention, >> but I will be again getting emails about "resume is broken". >> >> I made a decision: no waiting for spinup. That causes a delay for the user on >> first access, but that makes resume overall far faster. I do not want to go back >> on that, unless you can confirm that there is a real regression/error/data >> corruption happening. > > I agree that is fine. Still, R-CAR is not working for you with the patch adding the link. Looking into it now. Trying to verify things with qemu & regular AHCI on PCs too as I do not have an R-CAR board. Hard to debug :) Preparing some patches for you to get more debug info. > > Gr{oetje,eeting}s, > > Geert > -- Damien Le Moal Western Digital Research