On 8/25/23 03:28, Rodrigo Vivi wrote: > On Mon, Jul 31, 2023 at 09:39:56AM +0900, Damien Le Moal wrote: >> During system resume, ata_port_pm_resume() triggers ata EH to >> 1) Resume the controller >> 2) Reset and rescan the ports >> 3) Revalidate devices >> This EH execution is started asynchronously from ata_port_pm_resume(), >> which means that when sd_resume() is executed, none or only part of the >> above processing may have been executed. However, sd_resume() issues a >> START STOP UNIT to wake up the drive from sleep mode. This command is >> translated to ATA with ata_scsi_start_stop_xlat() and issued to the >> device. However, depending on the state of execution of the EH process >> and revalidation triggerred by ata_port_pm_resume(), two things may >> happen: >> 1) The START STOP UNIT fails if it is received before the controller has >> been reenabled at the beginning of the EH execution. This is visible >> with error messages like: >> >> ata10.00: device reported invalid CHS sector 0 >> sd 9:0:0:0: [sdc] Start/Stop Unit failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK >> sd 9:0:0:0: [sdc] Sense Key : Illegal Request [current] >> sd 9:0:0:0: [sdc] Add. Sense: Unaligned write command >> sd 9:0:0:0: PM: dpm_run_callback(): scsi_bus_resume+0x0/0x90 returns -5 >> sd 9:0:0:0: PM: failed to resume async: error -5 >> >> 2) The START STOP UNIT command is received while the EH process is >> on-going, which mean that it is stopped and must wait for its >> completion, at which point the command is rather useless as the drive >> is already fully spun up already. This case results also in a >> significant delay in sd_resume() which is observable by users as >> the entire system resume completion is delayed. >> >> Given that ATA devices will be woken up by libata activity on resume, >> sd_resume() has no need to issue a START STOP UNIT command, which solves >> the above mentioned problems. Do not issue this command by introducing >> the new scsi_device flag no_start_on_resume and setting this flag to 1 >> in ata_scsi_dev_config(). sd_resume() is modified to issue a START STOP >> UNIT command only if this flag is not set. > > Hi Damien, > > Last week I noticed that a basic test in our validation started failing, > then I noticed that it was subsequent quick suspend and autoresume using > rtcwake that was problematic. > > I couldn't collect any specific log that was pointing to some useful direction. > After a painful bisect I got to this patch. After reverting in from the > top of our tree, the tests are back to life. > > The issue was that the subsequent quick suspend-resume (sometimes the > second, sometimes third or even sixth) was simply hanging the machine > in different points at Suspend. > > So, maybe we have some kind of disks/configuration out there where this > start upon resume is needed? Maybe it is just a matter of timming to > ensure some firmware underneath is up and back to life? > > Well, please let me know the best way to report this issue to you and what > kind of logs I should get. > > Meanwhile if this ends up blocking our CI we can keep a revert in a > topic branch for CI. Can you try adding the patch attached to this email ? Thanks. -- Damien Le Moal Western Digital Research
From 20b636494f9c98bbca50d6b2fb1235f47476cdb4 Mon Sep 17 00:00:00 2001 From: Damien Le Moal <dlemoal@xxxxxxxxxx> Date: Fri, 25 Aug 2023 15:41:14 +0900 Subject: [PATCH] ata: libata-scsi: link ata port and scsi device There is no direct ancestry between an ata_device and its scsi device, which prevents the power management code from correctly ordering suspend and resume operations, which requires additional code to be handled correctly. Create such ancestry to allow simplifying the code. The parent-child (supplier-consumer) relationship is established between the ata_port (parent) and the scsi device (child) with device_add_link(). The parent used is not the ata_device as the PM operations are defined per port and devices status controlled from these port operations. The device link is established with the new function ata_scsi_dev_alloc(). This function is used to define the ->slave_alloc callback of the scsi host template of all drivers. Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx> --- drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++++++++++++----- drivers/ata/libata.h | 1 + drivers/ata/pata_macio.c | 1 + drivers/ata/sata_mv.c | 1 + drivers/ata/sata_nv.c | 2 ++ drivers/ata/sata_sil24.c | 1 + include/linux/libata.h | 3 +++ 7 files changed, 50 insertions(+), 5 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index c6ece32de8e3..ab572cc9b3f9 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1139,6 +1139,45 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) return 0; } +int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap) +{ + struct device_link *link; + + ata_scsi_sdev_config(sdev); + + /* + * Create a link from the ata_port device to the scsi device to ensure + * that PM does suspend/resume in the correct order: the scsi device is + * consumer (child) and the ata port the supplier (parent). + */ + link = device_link_add(&sdev->sdev_gendev, &ap->tdev, + DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE); + if (!link) { + ata_port_err(ap, "Failed to create link to scsi device %s\n", + dev_name(&sdev->sdev_gendev)); + return -ENODEV; + } + + return 0; +} + +/** + * ata_scsi_slave_alloc - Early setup of SCSI device + * @sdev: SCSI device to examine + * + * This is called from scsi_alloc_sdev() when the scsi device + * associated with an ATA device is scanned on a port. + * + * LOCKING: + * Defined by SCSI layer. We don't really care. + */ + +int ata_scsi_slave_alloc(struct scsi_device *sdev) +{ + return ata_scsi_dev_alloc(sdev, ata_shost_to_port(sdev->host)); +} +EXPORT_SYMBOL_GPL(ata_scsi_slave_alloc); + /** * ata_scsi_slave_config - Set SCSI device attributes * @sdev: SCSI device to examine @@ -1155,14 +1194,11 @@ int ata_scsi_slave_config(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev = __ata_scsi_find_dev(ap, sdev); - int rc = 0; - - ata_scsi_sdev_config(sdev); if (dev) - rc = ata_scsi_dev_config(sdev, dev); + return ata_scsi_dev_config(sdev, dev); - return rc; + return 0; } EXPORT_SYMBOL_GPL(ata_scsi_slave_config); diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h index cf993885d2b2..c45cac2a3631 100644 --- a/drivers/ata/libata.h +++ b/drivers/ata/libata.h @@ -113,6 +113,7 @@ extern struct ata_device *ata_scsi_find_dev(struct ata_port *ap, extern int ata_scsi_add_hosts(struct ata_host *host, const struct scsi_host_template *sht); extern void ata_scsi_scan_host(struct ata_port *ap, int sync); +extern int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap); extern int ata_scsi_offline_dev(struct ata_device *dev); extern bool ata_scsi_sense_is_valid(u8 sk, u8 asc, u8 ascq); extern void ata_scsi_set_sense(struct ata_device *dev, diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c index 17f6ccee53c7..32968b4cf8e4 100644 --- a/drivers/ata/pata_macio.c +++ b/drivers/ata/pata_macio.c @@ -918,6 +918,7 @@ static const struct scsi_host_template pata_macio_sht = { * use 64K minus 256 */ .max_segment_size = MAX_DBDMA_SEG, + .slave_alloc = ata_scsi_slave_alloc, .slave_configure = pata_macio_slave_config, .sdev_groups = ata_common_sdev_groups, .can_queue = ATA_DEF_QUEUE, diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c index d404e631d152..37a0bbaa8341 100644 --- a/drivers/ata/sata_mv.c +++ b/drivers/ata/sata_mv.c @@ -673,6 +673,7 @@ static const struct scsi_host_template mv6_sht = { .sdev_groups = ata_ncq_sdev_groups, .change_queue_depth = ata_scsi_change_queue_depth, .tag_alloc_policy = BLK_TAG_ALLOC_RR, + .slave_alloc = ata_scsi_slave_alloc, .slave_configure = ata_scsi_slave_config }; diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c index abf5651c87ab..77193940b3f2 100644 --- a/drivers/ata/sata_nv.c +++ b/drivers/ata/sata_nv.c @@ -380,6 +380,7 @@ static const struct scsi_host_template nv_adma_sht = { .can_queue = NV_ADMA_MAX_CPBS, .sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN, .dma_boundary = NV_ADMA_DMA_BOUNDARY, + .slave_alloc = ata_scsi_slave_alloc, .slave_configure = nv_adma_slave_config, .sdev_groups = ata_ncq_sdev_groups, .change_queue_depth = ata_scsi_change_queue_depth, @@ -391,6 +392,7 @@ static const struct scsi_host_template nv_swncq_sht = { .can_queue = ATA_MAX_QUEUE - 1, .sg_tablesize = LIBATA_MAX_PRD, .dma_boundary = ATA_DMA_BOUNDARY, + .slave_alloc = ata_scsi_slave_alloc, .slave_configure = nv_swncq_slave_config, .sdev_groups = ata_ncq_sdev_groups, .change_queue_depth = ata_scsi_change_queue_depth, diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c index e72a0257990d..ed09d653741f 100644 --- a/drivers/ata/sata_sil24.c +++ b/drivers/ata/sata_sil24.c @@ -381,6 +381,7 @@ static const struct scsi_host_template sil24_sht = { .tag_alloc_policy = BLK_TAG_ALLOC_FIFO, .sdev_groups = ata_ncq_sdev_groups, .change_queue_depth = ata_scsi_change_queue_depth, + .slave_alloc = ata_scsi_slave_alloc, .slave_configure = ata_scsi_slave_config }; diff --git a/include/linux/libata.h b/include/linux/libata.h index 820f7a3a2749..590ded8e319d 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1151,6 +1151,7 @@ extern int ata_std_bios_param(struct scsi_device *sdev, struct block_device *bdev, sector_t capacity, int geom[]); extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev); +extern int ata_scsi_slave_alloc(struct scsi_device *sdev); extern int ata_scsi_slave_config(struct scsi_device *sdev); extern void ata_scsi_slave_destroy(struct scsi_device *sdev); extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, @@ -1413,12 +1414,14 @@ extern const struct attribute_group *ata_common_sdev_groups[]; __ATA_BASE_SHT(drv_name), \ .can_queue = ATA_DEF_QUEUE, \ .tag_alloc_policy = BLK_TAG_ALLOC_RR, \ + .slave_alloc = ata_scsi_slave_alloc, \ .slave_configure = ata_scsi_slave_config #define ATA_SUBBASE_SHT_QD(drv_name, drv_qd) \ __ATA_BASE_SHT(drv_name), \ .can_queue = drv_qd, \ .tag_alloc_policy = BLK_TAG_ALLOC_RR, \ + .slave_alloc = ata_scsi_slave_alloc, \ .slave_configure = ata_scsi_slave_config #define ATA_BASE_SHT(drv_name) \ -- 2.41.0