On 10/13/23 04:01, Phillip Susi wrote: > Damien Le Moal <dlemoal () kernel ! org> writes: > >> In theory, yes, that was the intent. In practice, the verify was issued from >> scsi PM resume context while the actual drive port reset + revalidation is done >> in libata EH context, triggered from ATA port resume context which itself was >> not synchronized/ordered with the scsi disk resume. So we ended up with the >> verify command execution sometimes being attempted with the drive not even >> revalidated yet, or with the port/link not even active sometimes (depending on >> timing). So problems all over and deadlocks due to scsi revalidate using the >> device lock, which PM use too. > > Yikes. > >> See above. With the switch to async PM ops in scsi in kernel 5.16, things broke >> badly due to the lack of synchronization that sync PM provided before that. > > Yes, but without async PM ops, the IDENTIFY command that was not > preceeded by a VERIFY worked just fine, right? Yes. I rechecked the specs regarding this and there is nothing preventing IDENTIFY from completing with the drive spun down. The only corner case is when PUIS is enabled, in which case IDENTIFY may return incomplete data. But that is handled already and that is not something we can get with a system suspend/resume or runtime suspend/resume. >> ACS defines that only media access commands can get a drive out of standby mode >> back into active mode. So an IDENTIFY command would not (normally) >> spinup a > > Right, it won't CAUSE the drive to spin up, but if it is already in the > process of spinning up ( due to the reset ), then the drive will finish > spinning up before answering the IDENTIFY command. Or do you think that > some drives may handle the IDENTIFY wrong if they are still in the > process of spinning up? >From re-reading the specs and testing with all my drives, the port reset spins up the drives and IDENTIFY completes OK before the spinup completes, so there is no delay. I CC-ed you a couple of patches that move the VERIFY command issuin to after revalidation (so execution of IDENTIFY, READ LOG etc). That works well. I also added a CHECK POWER MODE command to check if sending the verify is actually needed. And even while the disks are spinning up, I get power mode 0xFF indicating ACTIVE state, so no need to send the VERIFY command at all. The end result is that we get to finish the libata EH context doing the resume well before the disk finishes spinning up (which can take 10+ seconds). With this, the first read or write command following the resume will be delayed until the drive finishes spinning up. But that is fine given the default 30s tiemout and retries. I do not expect any problems with that. -- Damien Le Moal Western Digital Research