Re: [PATCH v9 3/5] libata: support concurrent positioning ranges log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2021-11-02 at 15:02 +0100, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> On Tue, Nov 2, 2021 at 12:42 PM Damien Le Moal
> <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote:
> > On 2021/11/02 19:40, Geert Uytterhoeven wrote:
> > > On Wed, 27 Oct 2021, Damien Le Moal wrote:
> > > > Add support to discover if an ATA device supports the Concurrent
> > > > Positioning Ranges data log (address 0x47), indicating that the device
> > > > is capable of seeking to multiple different locations in parallel using
> > > > multiple actuators serving different LBA ranges.
> > > > 
> > > > Also add support to translate the concurrent positioning ranges log
> > > > into its equivalent Concurrent Positioning Ranges VPD page B9h in
> > > > libata-scsi.c.
> > > > 
> > > > The format of the Concurrent Positioning Ranges Log is defined in ACS-5
> > > > r9.
> > > > 
> > > > Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
> > > 
> > > Thanks for your patch, which is now commit fe22e1c2f705676a ("libata:
> > > support concurrent positioning ranges log") upstream.
> > > 
> > > During resume from s2ram on Renesas Salvator-XS, I now see more scary
> > > messages than before:
> > > 
> > >       ata1: link resume succeeded after 1 retries
> > >       ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >      +ata1.00: qc timeout (cmd 0x2f)
> > >      +ata1.00: Read log page 0x00 failed, Emask 0x4
> > >      +ata1.00: ATA Identify Device Log not supported
> > >      +ata1.00: failed to set xfermode (err_mask=0x40)
> > >       ata1: link resume succeeded after 1 retries
> > >       ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >      +ata1.00: ATA Identify Device Log not supported
> > >      +ata1.00: ATA Identify Device Log not supported
> > >       ata1.00: configured for UDMA/133
> > > 
> > > I guess this is expected?
> > 
> > Nope, it is not. The problem is actually not the concurrent positioning log, or
> > any other log, being supported or not.
> > 
> > Notice the qc timeout ? On device scan after coming out of sleep, or even simply
> > doing a rmmod ahci+modprobe ahci, the read log commands issued during device
> > revalidate timeout fairly easily as they are issued while the drive is not
> > necessarilly fully restarted yet. These errors happen fairly easily due to the
> > command timeout setting in libata being too short, I think, for the "restart"
> > case. On a clean boot, they do not happen as longer timeouts are used in that case.
> > 
> > I identified this problem recently while testing stuff: I was doing rmmod of ata
> > modules and then modprobe of newly compiled modules for tests and noticed these
> > timeouts. Increasing the timeout values, they disappear. I am however still
> > scratching my head about the best way to address this. Still digging about this
> > to first make sure this is really about timeouts being set too short.
> 
> There's indeed something timing-related going on.  Sometimes I get
> during resume (s2idle or s2ram):
> 
>     ata1.00: qc timeout (cmd 0x2f)
>     ata1.00: Read log page 0x00 failed, Emask 0x4
>     ata1.00: ATA Identify Device Log not supported
>     ata1.00: failed to set xfermode (err_mask=0x40)
>     ata1.00: limiting speed to UDMA/133:PIO3
>     ata1: link resume succeeded after 1 retries
>     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     ata1.00: NODEV after polling detection
>     ata1.00: revalidation failed (errno=-2)
>     ata1.00: disabled
>     ata1: link resume succeeded after 1 retries
>     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04
> driverbyte=DRIVER_OK
>     sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x04
> driverbyte=DRIVER_OK
>     sd 0:0:0:0: [sda] Sense not available.
>     sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x04
> driverbyte=DRIVER_OK
>     sd 0:0:0:0: [sda] Sense not available.
>     sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
>     sda: detected capacity change from 320173056 to 0
> 
> after which the drive is no longer functional...

Geert,

Could you try with the following patch added to see if the problem goes away ?


diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bf9c4b6c5c3d..e53f4ea71d38 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -93,6 +93,12 @@ static const unsigned long ata_eh_identify_timeouts[] = {
 	ULONG_MAX,
 };
 
+static const unsigned long ata_eh_revalidate_timeouts[] = {
+	15000,	/* Some drives are slow to read log pages when waking-up */
+	15000,  /* combined time till here is enough even for media access */
+	ULONG_MAX,
+};
+
 static const unsigned long ata_eh_flush_timeouts[] = {
 	15000,	/* be generous with flush */
 	15000,  /* ditto */
@@ -129,16 +135,17 @@ static const struct ata_eh_cmd_timeout_ent
 ata_eh_cmd_timeout_table[ATA_EH_CMD_TIMEOUT_TABLE_SIZE] = {
 	{ .commands = CMDS(ATA_CMD_ID_ATA, ATA_CMD_ID_ATAPI),
 	  .timeouts = ata_eh_identify_timeouts, },
-	{ .commands = CMDS(ATA_CMD_READ_NATIVE_MAX,
ATA_CMD_READ_NATIVE_MAX_EXT),
-	  .timeouts = ata_eh_other_timeouts, },
-	{ .commands = CMDS(ATA_CMD_SET_MAX, ATA_CMD_SET_MAX_EXT),
-	  .timeouts = ata_eh_other_timeouts, },
-	{ .commands = CMDS(ATA_CMD_SET_FEATURES),
-	  .timeouts = ata_eh_other_timeouts, },
-	{ .commands = CMDS(ATA_CMD_INIT_DEV_PARAMS),
-	  .timeouts = ata_eh_other_timeouts, },
+	{ .commands = CMDS(ATA_CMD_READ_LOG_EXT, ATA_CMD_READ_LOG_DMA_EXT),
+	  .timeouts = ata_eh_revalidate_timeouts, },
 	{ .commands = CMDS(ATA_CMD_FLUSH, ATA_CMD_FLUSH_EXT),
 	  .timeouts = ata_eh_flush_timeouts },
+	{ .commands = CMDS(ATA_CMD_READ_NATIVE_MAX,	\
+			   ATA_CMD_READ_NATIVE_MAX_EXT,		\
+			   ATA_CMD_SET_MAX,			\
+			   ATA_CMD_SET_MAX_EXT,			\
+			   ATA_CMD_SET_FEATURES,		\
+			   ATA_CMD_INIT_DEV_PARAMS),
+	  .timeouts = ata_eh_other_timeouts, },
 };
 #undef CMDS
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 236ec689056a..f2b12057ffcf 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -394,7 +394,7 @@ enum {
 	/* This should match the actual table size of
 	 * ata_eh_cmd_timeout_table in libata-eh.c.
 	 */
-	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 6,
+	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 4,
 
 	/* Horkage types. May be set by libata or controller on drives
 	   (some horkage may be drive/controller pair dependent */


On my test box, I can reliably generate the same qc timeout errors you are
seeing by doing:

rmmod sd_mod ahci libahci libata
modprobe ahci

The first command will hard reset the drives (causing them to "reboot").
When the second command starts, revalidate is executed with the drives slow to
respond to read log commands. The patch adds an auto timeout for read log
commands, to set the timeout to 15s instead of the default 5s. With that, all
timeout errors disappear. Note that these timeout numbers are totally
arbitrary...


> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

-- 
Damien Le Moal
Western Digital Research





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux