Re: [PATCH v4 2/2] ata: libata: Defer rescan on suspended device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/05/03 0:04, Kai-Heng Feng wrote:
> During system resume, if an EH is schduled after ATA host is resumed
> (i.e. ATA_PFLAG_PM_PENDING cleared), but before the disk device is
> fully resumed, the device_lock hold by scsi_rescan_device() is never
> released so the dpm_resume() of the disk is blocked forerver.
> 
> That's because scsi_attach_vpd() is expecting the disk device is in
> operational state, as it doesn't work on suspended device.
> 
> To avoid such deadlock, defer rescan if the disk is still suspended so
> the resume process of the disk device can proceed. At the end of the
> resume process, use the complete() callback to schedule the rescan task.
> 
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> ---
> v4: 
>  - No change.
> 
> v3:
>  - New patch to resolve undefined pm_suspend_target_state.
> 
> v2:
>  - Schedule rescan task at the end of system resume phase.
>  - Wording.
> 
>  drivers/ata/libata-core.c | 11 +++++++++++
>  drivers/ata/libata-eh.c   | 11 +++++++++--
>  include/linux/libata.h    |  1 +
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 8bf612bdd61a..bdd244bdb8a2 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5093,6 +5093,16 @@ static int ata_port_pm_poweroff(struct device *dev)
>  	return 0;
>  }
>  
> +static void ata_port_pm_complete(struct device *dev)
> +{
> +	struct ata_port *ap = to_ata_port(dev);
> +
> +	if (ap->pflags & ATA_PFLAG_DEFER_RESCAN)
> +		schedule_work(&(ap->scsi_rescan_task));
> +
> +	ap->pflags &= ~ATA_PFLAG_DEFER_RESCAN;

Is this called with the port lock held ? Otherwise, there is a race with
ata_eh_revalidate_and_attach() and we may end up never actually revalidating the
drive. At the very least, I think that ATA_PFLAG_DEFER_RESCAN needs to be
cleared before calling schedule_work().

> +}
> +
>  static const unsigned int ata_port_resume_ehi = ATA_EHI_NO_AUTOPSY
>  						| ATA_EHI_QUIET;
>  
> @@ -5158,6 +5168,7 @@ static const struct dev_pm_ops ata_port_pm_ops = {
>  	.thaw = ata_port_pm_resume,
>  	.poweroff = ata_port_pm_poweroff,
>  	.restore = ata_port_pm_resume,
> +	.complete = ata_port_pm_complete,
>  
>  	.runtime_suspend = ata_port_runtime_suspend,
>  	.runtime_resume = ata_port_runtime_resume,
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index a6c901811802..0881b590fb7e 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -15,6 +15,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/export.h>
>  #include <linux/pci.h>
> +#include <linux/suspend.h>
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_host.h>
>  #include <scsi/scsi_eh.h>
> @@ -2983,8 +2984,14 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
>  			 */
>  			ehc->i.flags |= ATA_EHI_SETMODE;
>  
> -			/* schedule the scsi_rescan_device() here */
> -			schedule_work(&(ap->scsi_rescan_task));
> +			/* Schedule the scsi_rescan_device() here.

Code style: please start multi-line comment with a line starting with "/*"
without text after it.

> +			 * Defer the rescan if it's in process of
> +			 * suspending or resuming.
> +			 */
> +			if (pm_suspend_target_state != PM_SUSPEND_ON)

Why ? Shouldn't this be "pm_suspend_target_state == PM_SUSPEND_ON" ? Because if
the device is already resumed, why would we need to defer the rescan ?

> +				ap->pflags |= ATA_PFLAG_DEFER_RESCAN;
> +			else
> +				schedule_work(&(ap->scsi_rescan_task));
>  		} else if (dev->class == ATA_DEV_UNKNOWN &&
>  			   ehc->tries[dev->devno] &&
>  			   ata_class_enabled(ehc->classes[dev->devno])) {
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 311cd93377c7..1696c9ebd168 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -189,6 +189,7 @@ enum {
>  	ATA_PFLAG_UNLOADING	= (1 << 9), /* driver is being unloaded */
>  	ATA_PFLAG_UNLOADED	= (1 << 10), /* driver is unloaded */
>  
> +	ATA_PFLAG_DEFER_RESCAN	= (1 << 16), /* peform deferred rescan on system resume */

Do we really need a new flag ? Can't we use ATA_PFLAG_PM_PENDING correctly ?
>From the rather sparse commit message description, it sounds like this flag is
being cleared too early. Not sure though. Need to dig further into this.

>  	ATA_PFLAG_SUSPENDED	= (1 << 17), /* port is suspended (power) */
>  	ATA_PFLAG_PM_PENDING	= (1 << 18), /* PM operation pending */
>  	ATA_PFLAG_INIT_GTM_VALID = (1 << 19), /* initial gtm data valid */

-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux