Re: [PATCH v8 04/23] scsi: sd: Differentiate system and runtime start/stop management

Damien Le Moal <dlemoal@xxxxxxxxxx> · Mon, 16 Oct 2023 07:09:30 +0900

On 10/13/23 23:36, Phillip Susi wrote:
> Damien Le Moal <dlemoal@xxxxxxxxxx> writes:
> 
>> Yes. I rechecked the specs regarding this and there is nothing preventing
>> IDENTIFY from completing with the drive spun down. The only corner case is when
>> PUIS is enabled, in which case IDENTIFY may return incomplete data. But that is
>> handled already and that is not something we can get with a system
>> suspend/resume or runtime suspend/resume.
> 
> It *IS* something we get on suspend/resume.  During suspend the drive
> loses power, and on resume, it regains power.  As far as the drive is
> concerned, the computer was shutdown and booted back up, so it powers up
> in standby.

In the hybernate (suspend to disk) case, yes, but not in the suspend to RAM
case. Anyway, the PUIS incomplete IDENTIFY case is already handled, so there are
no issues.

>> From re-reading the specs and testing with all my drives, the port reset spins
>> up the drives and IDENTIFY completes OK before the spinup completes, so there
>> is no delay.
> 
> Interesting.  I was under the impression that most disks have to read
> their sererial number and possibly other information from the media in
> order to report that in IDENTIFY, and therefore, they would have to
> finish spinning up before they could return complete information.

Depends on the disk implementation. Not all disks put their metadata on media.
So some disks can start replying to commands like IDENTIFY even with the disks
not fully spun up yet. This difference shows up with (sometimes) seeing
"IDENTIFY failed" due to a timeout on resume. I have old drives that are slow to
spinup and show that. But that is handled with the retries with increased
timeouts. I think it would be nice to patch that though, with longer timeout for
IDENTIFY on resume, to avoid these error messages.

>> I CC-ed you a couple of patches that move the VERIFY command
>> issuin to after revalidation (so execution of IDENTIFY, READ LOG etc). That
>> works well. I also added a CHECK POWER MODE command to check if sending the
>> verify is actually needed. And even while the disks are spinning up, I get
>> power mode 0xFF indicating ACTIVE state, so no need to send the VERIFY command
>> at all. The end result is that we get to finish the libata EH context doing the
>> resume well before the disk finishes spinning up (which can take 10+ seconds).
>>
>> With this, the first read or write command following the resume will be delayed
>> until the drive finishes spinning up. But that is fine given the default 30s
>> tiemout and retries. I do not expect any problems with that.
> 
> That looks very good.  I think I will try to adapt my old patch to allow
> the eh to return -EAGAIN and leave the drive in standby rather than
> force it to wake up with the VERIFY in the system resume path.  The eh
> can be retried later when the drive is actually accessed and that time
> it can force it to spin up.

I am not following. What problem are you trying to fix ?

-- 
Damien Le Moal
Western Digital Research