Re: [PATCH v8 04/23] scsi: sd: Differentiate system and runtime start/stop management

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/22/23 02:56, Phillip Susi wrote:
> Damien Le Moal <dlemoal@xxxxxxxxxx> writes:
> 
>> In the hybernate (suspend to disk) case, yes, but not in the suspend
>> to RAM
> 
> No; in BOTH cases.  In S3, the PSU is shut down and only aux power is
> maintained to keep the sdram in auto refresh mode.  The main power rails
> that power the disk drives are shut down.  The drive can not tell
> whether suspend to disk or suspend to ram was used.

OK. I am a bit fuzzy on what ACPI modes do :)

>> Depends on the disk implementation. Not all disks put their metadata on media.
>> So some disks can start replying to commands like IDENTIFY even with
>> the disks
> 
> Yes, the standard allows either way and I'm sure that there are both out
> there, I just thought that the incomplete IDENTIFY because it needs the
> media case was the more common of the two.  At least I think my WD
> drives do that.

The incomplete IDENTIFY with PUIS is likely very common with old drives. Newer
ones may not rely on the media for metadata, especially enterprise drives.

>> not fully spun up yet. This difference shows up with (sometimes) seeing
>> "IDENTIFY failed" due to a timeout on resume. I have old drives that are slow to
>> spinup and show that. But that is handled with the retries with increased
>> timeouts. I think it would be nice to patch that though, with longer timeout for
>> IDENTIFY on resume, to avoid these error messages.
> 
> Isn't the timeout like 10 or 20 seconds and typical spinup times only
> 3-4?  Still, yes, a little longer timeout for the IDENTIFY after resume
> might be a good idea.

The timeouts for identify when scanning the device are defined by
ata_eh_identify_timeouts in libata-eh.c. So 5s, 10s and 30s defined in 3
retries. But on boot, without PUIS, the drive is already (mostly) spun up
generally when the scan code is executed. So even the short-ish first timeout of
5s is enough. But for resume, the time between restoring power and executing
IDENTIFY is much shorter, so the first 5s timeout is a little too short.
IDENTIFY will be OK with the second 10s timeout for most drives, but I do have
some that are very slow and endup needing the third 30s timeout.

To avoid that, I think we need to use the longer ata_eh_reset_timeouts or define
a set of resume timeouts (e.g. a ata_eh_resume_timeouts array).

>> I am not following. What problem are you trying to fix ?
> 
> I would like my archive disks to PuiS and just remain in standby after a
> system resume, because I hardly ever access them.  Thus it's putting
> unneccesary wear and tear on them to spin up, then spin right back down
> every time I system suspend and resume.

The problem here is the potentially incomplete IDENTIFY on resume. We need the
full IDENTIFY to ensure that nothing changed with the drive. So for a drive that
reports an incomplete IDENTIFY with PUIS, we need to spin up that drive on
resume, even if it was runtime suspended before the system suspend.

> I suspend/resume my desktop at least once a day, but I can see this
> being an even larger problem on a NAS that wants to use opportunistic
> suspend to save power.  Just because you say, want to connect to the
> management web interface, or have a scheduled cron job every hour that
> forces it to wake up from system suspend does not mean that the disk, or
> all disks, need to spin up each time.

Yes, that would be nice to have such feature, but I do not see how to reliably
implement that.

> Laptops also tend to system suspend/resume more frequently, it it seems
> to be becoming at least somewhat common to have a HDD for archive, and an
> SSD/NVME to run the system on, and the HDD is not likely going to be
> accessed every time the system suspends and resumes, so why bother
> spinning it up automatically?

Sure, but I think we can skip the spinup only and only if the drive has PUIS
enabled and reported a complete IDENTIFY: in this case, we can skip the SET
FEATURES command that spins up the drive to leave the drive spun down. Some more
code will be needed though so that we track that the drive is spun sdown because
of PUIS and not because of STANDBY IMMEDIATE, so that ata_dev_power_set_active()
issues a SET FEATURES to spinup the drive instead of the regular VERIFY command.

If the drive has PUIS and reported an incomplete IDENTIFY, we need to spin it up
to finish revalidation normally. And for drives that do not have PUIS or have
PUIS disabled, the drive will spin up on system resume, always. For these, we
could call ata_dev_power_set_standby() at the end of revalidation to be
consistent with the runtime suspend state (if the drive was runtime suspended
before the system suspend.

Note: I am still not able to see runtime suspend automatically suspend the ata
port if I enable a scsi disk autosuspend_delay_ms. That does not happen for me.
In fact, I cannot even read a ata port autosuspend delay:

cat /sys/class/ata_port/ata2/power/autosuspend_delay_ms
cat: /sys/class/ata_port/ata2/power/autosuspend_delay_ms: Input/output error

But I have:

cat /sys/class/ata_port/ata2/power/control
auto

So this is all weird. There is a bug somewhere. I need to dig into that.

-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux