Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 31, 2023 at 09:39:56AM +0900, Damien Le Moal wrote:
> During system resume, ata_port_pm_resume() triggers ata EH to
> 1) Resume the controller
> 2) Reset and rescan the ports
> 3) Revalidate devices
> This EH execution is started asynchronously from ata_port_pm_resume(),
> which means that when sd_resume() is executed, none or only part of the
> above processing may have been executed. However, sd_resume() issues a
> START STOP UNIT to wake up the drive from sleep mode. This command is
> translated to ATA with ata_scsi_start_stop_xlat() and issued to the
> device. However, depending on the state of execution of the EH process
> and revalidation triggerred by ata_port_pm_resume(), two things may
> happen:
> 1) The START STOP UNIT fails if it is received before the controller has
>    been reenabled at the beginning of the EH execution. This is visible
>    with error messages like:
> 
> ata10.00: device reported invalid CHS sector 0
> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current]
> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command
> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5
> sd 9:0:0:0: PM: failed to resume async: error -5
> 
> 2) The START STOP UNIT command is received while the EH process is
>    on-going, which mean that it is stopped and must wait for its
>    completion, at which point the command is rather useless as the drive
>    is already fully spun up already. This case results also in a
>    significant delay in sd_resume() which is observable by users as
>    the entire system resume completion is delayed.
> 
> Given that ATA devices will be woken up by libata activity on resume,
> sd_resume() has no need to issue a START STOP UNIT command, which solves
> the above mentioned problems. Do not issue this command by introducing
> the new scsi_device flag no_start_on_resume and setting this flag to 1
> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP
> UNIT command only if this flag is not set.

Hi Damien,

Last week I noticed that a basic test in our validation started failing,
then I noticed that it was subsequent quick suspend and autoresume using
rtcwake that was problematic.

I couldn't collect any specific log that was pointing to some useful direction.
After a painful bisect I got to this patch. After reverting in from the
top of our tree, the tests are back to life.

The issue was that the subsequent quick suspend-resume (sometimes the
second, sometimes third or even sixth) was simply hanging the machine
in different points at Suspend.

So, maybe we have some kind of disks/configuration out there where this
start upon resume is needed? Maybe it is just a matter of timming to
ensure some firmware underneath is up and back to life?

Well, please let me know the best way to report this issue to you and what
kind of logs I should get.

Meanwhile if this ends up blocking our CI we can keep a revert in a
topic branch for CI.

Thanks,
Rodrigo.

> 
> Reported-by: Paul Ausbeck <paula@xxxxxxxxxxxx>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215880
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
> ---
>  drivers/ata/libata-scsi.c  | 7 +++++++
>  drivers/scsi/sd.c          | 9 ++++++---
>  include/scsi/scsi_device.h | 1 +
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 370d18aca71e..c6ece32de8e3 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1100,7 +1100,14 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>  		}
>  	} else {
>  		sdev->sector_size = ata_id_logical_sector_size(dev->id);
> +		/*
> +		 * Stop the drive on suspend but do not issue START STOP UNIT
> +		 * on resume as this is not necessary and may fail: the device
> +		 * will be woken up by ata_port_pm_resume() with a port reset
> +		 * and device revalidation.
> +		 */
>  		sdev->manage_start_stop = 1;
> +		sdev->no_start_on_resume = 1;
>  	}
>  
>  	/*
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 68b12afa0721..3c668cfb146d 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3876,7 +3876,7 @@ static int sd_suspend_runtime(struct device *dev)
>  static int sd_resume(struct device *dev)
>  {
>  	struct scsi_disk *sdkp = dev_get_drvdata(dev);
> -	int ret;
> +	int ret = 0;
>  
>  	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
>  		return 0;
> @@ -3884,8 +3884,11 @@ static int sd_resume(struct device *dev)
>  	if (!sdkp->device->manage_start_stop)
>  		return 0;
>  
> -	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
> -	ret = sd_start_stop_device(sdkp, 1);
> +	if (!sdkp->device->no_start_on_resume) {
> +		sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
> +		ret = sd_start_stop_device(sdkp, 1);
> +	}
> +
>  	if (!ret)
>  		opal_unlock_from_suspend(sdkp->opal_dev);
>  	return ret;
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 75b2235b99e2..b9230b6add04 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -194,6 +194,7 @@ struct scsi_device {
>  	unsigned no_start_on_add:1;	/* do not issue start on add */
>  	unsigned allow_restart:1; /* issue START_UNIT in error handler */
>  	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
> +	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
>  	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
>  	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
>  	unsigned select_no_atn:1;
> -- 
> 2.41.0
> 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux