Re: [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/13/23 17:29, Damien Le Moal wrote:
On 9/14/23 05:50, Bart Van Assche wrote:
On 9/10/23 21:02, Damien Le Moal wrote:
If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. However, since this function calls sd_shutdown(),
a synchronize cache command and a start stop unit may be issued with the
drive still sleeping and the HBA non-functional. This causes PM resume
to hang, forcing a reset of the machine to recover.

Fix this by checking a device host state in sd_shutdown() and by
returning early doing nothing if the host state is not SHOST_RUNNING.

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
---
   drivers/scsi/sd.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c92a317ba547..a415abb721d3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
   	if (!sdkp)
   		return;         /* this can happen */
- if (pm_runtime_suspended(dev))
+	if (pm_runtime_suspended(dev) ||
+	    sdkp->device->host->shost_state != SHOST_RUNNING)
   		return;
if (sdkp->WCE && sdkp->media_present) {

Why to test the host state instead of dev->power.runtime_status? I don't
think that it is safe to skip shutdown if the error handler is active.
If the error handler can recover the device a SYNCHRONIZE CACHE command
should be submitted.

But there is no synchronization with EH that I can see anyway. At least for
sd_remove(), I would assume that this is called only once the device references
were all dropped, so presumably EH is not doing anything with the drive when
that happen, no ?

In any case, looking at dev->power.runtime_status is not correct as this is set
to RPM_ACTIVE when the device is suspended through system suspend. We could
replace the test "sdkp->device->host->shost_state != SHOST_RUNNING" with
"dev->power.is_suspended", as that indicates true (1) for a suspended device.
However, I really do not like that as that is a PM internal field and should not
be accessing it directly. The PM code comments say as much. Any better idea ?

I will reply to the above question on v2 of this patch.

Bart.




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux