On 10/10/23 22:09, Phillip Susi wrote: > Damien Le Moal <dlemoal@xxxxxxxxxx> writes: > >> system suspend/resume operations, the ATA port used to connect the >> device will also be suspended and resumed, with the resume operation >> requiring re-validating the device link and the device itself. In this >> case, issuing a VERIFY command to spinup the disk must be done before >> starting to revalidate the device, when the ata port is being resumed. >> In such case, we must not allow the SCSI disk driver to issue START STOP >> UNIT commands. > > Why must a VERIFY be issued to spinup the disk before revalidating? We can most likely move that VERIFY to after revalidation. That should shorten delays on first access after resume as many drive do actually spinup with the reset done before revalidating (but note that the specs say that even a reset shall not take a drive out of standby mode, without specifying the reset type, so this is rather loosely defined). > Before these patches, by default, manage_start_stop was on, and so sd > would cause a VERIFY in the system resume path. That resume however ( > sd and its issuing START UNIT ), would have happened AFTER the link was > resumed and the ATA device was revalidated, woudldn't it? So at that In theory, yes, that was the intent. In practice, the verify was issued from scsi PM resume context while the actual drive port reset + revalidation is done in libata EH context, triggered from ATA port resume context which itself was not synchronized/ordered with the scsi disk resume. So we ended up with the verify command execution sometimes being attempted with the drive not even revalidated yet, or with the port/link not even active sometimes (depending on timing). So problems all over and deadlocks due to scsi revalidate using the device lock, which PM use too. > point, the drive would already be spinning. And if manage_start_stop > was disabled, then there would be no VERIFY at all, and this did not > seem to cause a problem before. See above. With the switch to async PM ops in scsi in kernel 5.16, things broke badly due to the lack of synchronization that sync PM provided before that. > > If this VERIFY were skipped, the next thing that would happen is for > ata_dev_revalidate() to issue IDENTIFY, which would wait for the drive > to spin up before returning wouldn't it? ( unless the drive has PuiS > enabled ). ACS defines that only media access commands can get a drive out of standby mode back into active mode. So an IDENTIFY command would not (normally) spinup a drive. Nor would READ LOG. Normally, IDENTIFY, READ LOG etc done in revalidate can be processed with the drive spun down (*but* I have seen drives that react badly to some of these commands when spun down). Hence why I put it first, before revalidate. Now that all the synchronization is in place and libata EH does its own manage_start_stop for system suspend/resume, I will see if moving the verify to the end of revalidate works. -- Damien Le Moal Western Digital Research