Re: [PATCH v8 04/23] scsi: sd: Differentiate system and runtime start/stop management

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Damien Le Moal <dlemoal@xxxxxxxxxx> writes:

> On 10/21/23 06:23, Phillip Susi wrote:
> Looks like a deadlock somewhere, likely with the device remove that you
> triggered with the "echo 1 > /sys/block/sdf/device/delete".
>
> Can you send the exact list of commands & events you executed to get to that
> point ? Also please share your kernel config.

I wrote auto to /sys/block/sd[ef]/device/power/config and 10000 to
/sys/block/sd[ef]/device/power/auto_suspend_delay_ms, and auto to the
control file of their corresponding ata8 port ( both drives are behind
an PMP in an eSATA dock on ata8 ).  As I said before, it the dmesg
showed that the ata port only suspended after BOTH drives had suspended,
which is as it should be.  The problem seems to be after I echo 1 >
/sys/block/sdf/device/delete, when this happens:

Oct 26 16:43:00 faldara kernel: ata8.15: failed to read PMP GSCR[0] (Emask=0x100)
Oct 26 16:43:00 faldara kernel: ata8.15: PMP revalidation failed (errno=-5)
Oct 26 16:43:05 faldara kernel: ata8.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 26 16:43:05 faldara kernel: ata8.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Oct 26 16:43:05 faldara kernel: ata8.01: SATA link up 3.0 Gbps (SStatus
123 SControl 320)

Then I get the hung task.  I reverted the PuiS patch that I have been
working on, and the hang no longer happens, however, the above errors do
still happen.  I think that is a problem in itself, and may or may not
be related to the hang.  I see no reason why the PMP revalidation should
fail after the two disks and the port are runtime pm suspended, but for
some reason, with my patch applied, that then leads to the hang.

Here is my git log showing your two patches applied on the upstream
kernel, plus my patch:

commit bb5db8bb505171fd2b8b67c3f22b16d8355d2a8b
Author: Phillip Susi <phill@xxxxxxxxxxxx>
Date:   Mon Oct 16 17:03:51 2023 -0400

    Olibata: don't start disks on resume
    
    Disks with Power Up In Standby enabled that required the
    SET FEATURES command to start up were being issued the
    command during resume.  Suppress this until the disk
    is actually accessed.

commit 4f1a1a9da4ff833417fa520d097b3f07c20e3c5d
Author: Damien Le Moal <dlemoal@xxxxxxxxxx>
Date:   Thu Oct 12 16:18:00 2023 +0900

    [PATCH 2/2] ata: libata-core: Improve ata_dev_power_set_active()
    
    Improve the function ata_dev_power_set_active() by having it do nothing
    for a disk that is already in the active power state. To do that,
    introduce the function ata_dev_power_is_active() to test the current
    power state of the disk and return true if the disk is in the PM0:
    active or PM1: idle state (0xff value for the count field of the CHECK
    POWER MODE command output).
    
    To preserve the existing behavior, if the CHECK POWER MODE command
    issued in ata_dev_power_is_active() fails, the drive is assumed to be in
    standby mode and false is returned.
    
    With this change, issuing the VERIFY command to access the disk media to
    spin it up becomes unnecessary most of the time during system resume as
    the port reset done by libata-eh on resume often result in the drive to
    spin-up (this behavior is not clearly defined by the ACS specifications
    and may thus vary between disk models).
    
    Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>

commit bb7e1fcd9e3a207820e4b828e9f4f02986942d8d
Author: Damien Le Moal <dlemoal@xxxxxxxxxx>
Date:   Thu Oct 12 16:17:59 2023 +0900

    ata: libata-eh: Spinup disk on resume after revalidation
    
    Move the call to ata_dev_power_set_active() to transition a disk in
    standby power mode to the active power mode from
    ata_eh_revalidate_and_attach() before doing revalidation to the end of
    ata_eh_recover(), after the link speed for the device is reconfigured
    (if that was necessary). This is safer as this ensure that the VERIFY
    command executed to spinup the disk is executed with the drive properly
    reconfigured first.
    
    Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>

commit 727fb83765049981e342db4c5a8b51aca72201d8
Merge: 8cb1f10d8c4b 5c15c60e7be6
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date:   Fri Oct 13 23:19:16 2023 -0700

    Merge tag 'input-for-v6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
    
    Pull input fixes from Dmitry Torokhov:
    
     - a reworked way for handling reset delay on SMBus-connected Synaptics
       touchpads (the original one, while being correct, uncovered an old
       bug in fallback to PS/2 code that was fixed separately; the new one
       however avoids having delay in serio port "fast" resume, and instead
       has the wait in the RMI4 code)
    
     - a fix for potential crashes when devices with Elan controllers (and
       Synaptics) fall back to PS/2 code. Can't be hit without the original
       patch above, but still good to have it fixed
    
     - a couple new device IDs in xpad Xbox driver
    
     - another quirk for Goodix driver to deal with stuff vendors put in
       ACPI tables
    
     - a fix for use-after-free on disconnect for powermate driver
    
     - a quirk to not initialize PS/2 mouse port on Fujitsu Lifebook E5411
       laptop as it makes keyboard not usable and the device uses
       hid-over-i2c touchpad anyways
    
    * tag 'input-for-v6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
      Input: powermate - fix use-after-free in powermate_config_complete
      Input: xpad - add PXN V900 support
      Input: synaptics-rmi4 - handle reset delay when using SMBus trsnsport
      Input: psmouse - fix fast_reconnect function for PS/2 mode
      Revert "Input: psmouse - add delay when deactivating for SMBus mode"
      Input: goodix - ensure int GPIO is in input for gpio_count == 1 && gpio_int_idx == 0 case
      Input: i8042 - add Fujitsu Lifebook E5411 to i8042 quirk table
      Input: xpad - add HyperX Clutch Gladiate Support
And here is my patch, which basically checks for PuiS during system
resume, and either forces the disk to suspend or resume depending on
whether it is PuiS.  Since I have not even engaged in any system
suspend/resume for this test, this patch should not have any effect.

So far in my testing of this patch on my 3 internal drives that I have
enabled PuiS on, it appears to work in so much as after a
suspend/resume, the runtime status of the disks is suspended ( as long
as I have *enabled* runtime pm on them, otherwise not ).  Another
problem seems to be that while the disks are left suspended after system
resume, they very quickly are resumed due to begnign IO eminating from
either ext4 or udsisks2 that does not cause a drive to spin up when it
is suspended with hdparm -y.  This would be the case of either FLUSH
CASH or CHECK POWER CONDITION not causing the drive to spin itself up
when given the commands, but the runtime pm core does not know that
these commands can be completed without resuming the disk, so it resumes
them first.

>From 77a3511d4058e3afccc4ba745c8c6ad6f7ac07a3 Mon Sep 17 00:00:00 2001
From: Phillip Susi <psusi@xxxxxxxxxx>
Date: Mon, 16 Dec 2013 18:30:55 -0500
Subject: [PATCH] libata: don't start disks on resume

Disks with Power Up In Standby enabled that required the
SET FEATURES command to start up were being issued the
command during resume.  Suppress this until the disk
is actually accessed.
---
 drivers/ata/libata-core.c | 25 +++++++++++++++++++++----
 drivers/ata/libata-eh.c   | 19 ++++++++++++++++++-
 drivers/ata/libata.h      |  1 +
 include/linux/libata.h    |  1 +
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 7c261907a1d0..cd4718fe02e1 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1912,7 +1912,20 @@ int ata_dev_read_id(struct ata_device *dev, unsigned int *p_class,
 			goto err_out;
 	}
 
-	if (!tried_spinup && (id[2] == 0x37c8 || id[2] == 0x738c)) {
+	if (flags & ATA_READID_NOSTART && id[2] == 0x37c8)
+	{
+		/*
+		 * the drive has powered up in standby, and returned incomplete IDENTIFY info
+		 * so we can't revalidate it yet.  We have also been asked to avoid starting the
+		 * drive, so stop  here and leave the drive asleep and the EH pending, to be
+		 * restarted on later IO request
+		 */
+		dev->flags |= ATA_DFLAG_SLEEPING;
+		return -EAGAIN;
+	}
+
+	if (!(flags & ATA_READID_NOSTART) &&
+	    !tried_spinup && (id[2] == 0x37c8 || id[2] == 0x738c)) {
 		tried_spinup = 1;
 		/*
 		 * Drive powered-up in standby mode, and requires a specific
@@ -3956,6 +3969,8 @@ int ata_dev_revalidate(struct ata_device *dev, unsigned int new_class,
 
 	/* re-read ID */
 	rc = ata_dev_reread_id(dev, readid_flags);
+	if (rc == -EAGAIN)
+		return rc;
 	if (rc)
 		goto fail;
 
@@ -5172,6 +5187,10 @@ static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 		spin_lock_irqsave(ap->lock, flags);
 	}
 
+	/* on system resume, don't wake PuiS drives */
+	if (mesg.event == PM_EVENT_RESUME)
+		ehi_flags |= ATA_EHI_NOSTART;
+	
 	/* Request PM operation to EH */
 	ap->pm_mesg = mesg;
 	ap->pflags |= ATA_PFLAG_PM_PENDING;
@@ -5269,9 +5288,7 @@ static void ata_port_resume_async(struct ata_port *ap, pm_message_t mesg)
 static int ata_port_pm_resume(struct device *dev)
 {
 	ata_port_resume_async(to_ata_port(dev), PMSG_RESUME);
-	pm_runtime_disable(dev);
-	pm_runtime_set_active(dev);
-	pm_runtime_enable(dev);
+	printk(KERN_INFO "resume device: %p", dev);
 	return 0;
 }
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 09be31723a3c..e77805ba46b0 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -22,6 +22,7 @@
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_dbg.h>
 #include "../scsi/scsi_transport_api.h"
+#include <linux/pm_runtime.h>
 
 #include <linux/libata.h>
 
@@ -3042,6 +3043,8 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 
 		if (ehc->i.flags & ATA_EHI_DID_RESET)
 			readid_flags |= ATA_READID_POSTRESET;
+		if (ehc->i.flags & ATA_EHI_NOSTART)
+			readid_flags |= ATA_READID_NOSTART;
 
 		if ((action & ATA_EH_REVALIDATE) && ata_dev_enabled(dev)) {
 			WARN_ON(dev->class == ATA_DEV_PMP);
@@ -3071,9 +3074,23 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 			ata_eh_about_to_do(link, dev, ATA_EH_REVALIDATE);
 			rc = ata_dev_revalidate(dev, ehc->classes[dev->devno],
 						readid_flags);
-			if (rc)
+			if (rc == -EAGAIN) {
+				rc = 0; /* start required but suppressed, handle later */
+				printk(KERN_INFO "sdev: %p", &dev->sdev->sdev_dev);
+				pm_runtime_disable(&dev->sdev->sdev_dev);
+				pm_runtime_set_suspended(&dev->sdev->sdev_dev);
+				pm_runtime_enable(&dev->sdev->sdev_dev);
+
+				continue;
+			}
+			else if (rc)
 				goto err;
 
+			printk(KERN_INFO "sdev: %p", &dev->sdev->sdev_dev);
+			pm_runtime_disable(&dev->sdev->sdev_dev);
+			pm_runtime_set_active(&dev->sdev->sdev_dev);
+			pm_runtime_enable(&dev->sdev->sdev_dev);
+
 			ata_eh_done(link, dev, ATA_EH_REVALIDATE);
 
 			/* Configuration may have changed, reconfigure
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 05ac80da8ebc..cc13777d2811 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -19,6 +19,7 @@
 enum {
 	/* flags for ata_dev_read_id() */
 	ATA_READID_POSTRESET	= (1 << 0), /* reading ID after reset */
+	ATA_READID_NOSTART	= (1 << 1), /* do not start drive */
 
 	/* selector for ata_down_xfermask_limit() */
 	ATA_DNXFER_PIO		= 0,	/* speed down PIO */
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 2a7d2af0ed80..77769351ab99 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -328,6 +328,7 @@ enum {
 
 	/* ata_eh_info->flags */
 	ATA_EHI_HOTPLUGGED	= (1 << 0),  /* could have been hotplugged */
+	ATA_EHI_NOSTART		= (1 << 1),  /* don't start the disk */
 	ATA_EHI_NO_AUTOPSY	= (1 << 2),  /* no autopsy */
 	ATA_EHI_QUIET		= (1 << 3),  /* be quiet */
 	ATA_EHI_NO_RECOVERY	= (1 << 4),  /* no recovery */
-- 
2.41.0


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux