Re: SATA LPM issue - ATA error in logs 'frozen'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thank you for reporting this.

On 23-04-18 03:03, Kevin Shanahan wrote:
Hi,

After upgrading kernel from 4.15.9-1 to 4.16.3-1 (Arch Linux) my
router started responding very slowly.  These message were repeatedly
showing up in the logs:

   Apr 23 10:21:43 link kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen
   Apr 23 10:21:43 link kernel: ata1: SError: { PHYRdyChg CommWake }
   Apr 23 10:21:43 link kernel: ata1.00: failed command: WRITE DMA
   Apr 23 10:21:43 link kernel: ata1.00: cmd ca/00:08:60:5d:cd/00:00:00:00:00/e1 tag 9 dma 4096 out
                                         res 50/01:01:01:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
   Apr 23 10:21:43 link kernel: ata1.00: status: { DRDY }
   Apr 23 10:21:43 link kernel: ata1.00: error: { AMNF }
   Apr 23 10:21:43 link kernel: ata1: hard resetting link
   Apr 23 10:21:43 link kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
   Apr 23 10:21:43 link kernel: ata1.00: configured for UDMA/133
   Apr 23 10:21:43 link kernel: ata1: EH complete

I noticed that the SATA LPM states had now been enabled, so tried
changing from 'med_power_with_dipm' to 'medium_power' and the problem
went away:

   echo medium_power > /sys/class/scsi_host/host0/link_power_management_policy

Perhaps there is something about my combination of controller/drive that is not compatible?

I guess so I'm somewhat surprised about this because Samsung SSDs tend to be well
behaved, but this is a msata version, which may have some firmware differences
to the regular 2.5" models I guess and the PM830 SSD has many OEM firmware
customizations.

So based on that I think a narrow quirk targeting your specific firmware version
is the best solution for this for now.

I've attached a patch for this, can you build an arch kernel with that patch
added and see if that fixes things without you needing to manually change
anything?  /sys/class/scsi_host/host0/link_power_management_policy should now
default to max_performance for your SSD. Note that there are almost no powersavings
when going from maximum_performance to medium_power, so we simply disable LPM
all together on models which have issues with med_power_with_dipm.

May I ask what motherboard your router is using ?

Regards,

Hans



# hdparm -i /dev/sda

/dev/sda:

  Model=SAMSUNG MZMPC128HBFU-000MV, FwRev=CXM14M1Q, SerialNo=S19FNYAD394414
  Config={ Fixed }
  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
  BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=250069680
  IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
  PIO modes:  pio0 pio1 pio2 pio3 pio4
  DMA modes:  mdma0 mdma1 mdma2
  UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
  AdvancedPM=no WriteCache=enabled
  Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

# cat /proc/cpuinfo | grep "model name"
model name	    : Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
model name	    : Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
model name	    : Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
model name	    : Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz

# cat /sys/class/scsi_device/0\:0\:0\:0/device/model
SAMSUNG MZMPC128

Regards,
Kevin Shanahan.

>From 7f63aa54bf722b0585c5521d4728279d3d8fa40f Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@xxxxxxxxxx>
Date: Mon, 23 Apr 2018 09:27:28 +0200
Subject: [PATCH] libata: Apply NOLPM quirk for SAMSUNG MZMPC128HBFU-000MV SSD

Kevin Shanahan reports the following repeating errors when using LPM,
causing long delays accessing the disk:

  Apr 23 10:21:43 link kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen
  Apr 23 10:21:43 link kernel: ata1: SError: { PHYRdyChg CommWake }
  Apr 23 10:21:43 link kernel: ata1.00: failed command: WRITE DMA
  Apr 23 10:21:43 link kernel: ata1.00: cmd ca/00:08:60:5d:cd/00:00:00:00:00/e1 tag 9 dma 4096 out
                                        res 50/01:01:01:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
  Apr 23 10:21:43 link kernel: ata1.00: status: { DRDY }
  Apr 23 10:21:43 link kernel: ata1.00: error: { AMNF }
  Apr 23 10:21:43 link kernel: ata1: hard resetting link
  Apr 23 10:21:43 link kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
  Apr 23 10:21:43 link kernel: ata1.00: configured for UDMA/133
  Apr 23 10:21:43 link kernel: ata1: EH complete

These go away when switching from med_power_with_dipm to medium_power.

This is somewhat weird as the PM830 datasheet explicitly mentions DIPM
being supported and the idle power-consumption is specified with DIPM
enabled.

There are many OEM customized firmware versions for the PM830, so for now
lets assume this is firmware version specific and blacklist LPM based on
the firmware version.

Cc: Kevin Shanahan <kevin@xxxxxxxxxxxxxx>
Reported-by: Kevin Shanahan <kevin@xxxxxxxxxxxxxx>
Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
---
 drivers/ata/libata-core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8bc71ca61e7f..6e400ff2b5db 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4549,6 +4549,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
 						ATA_HORKAGE_ZERO_AFTER_TRIM |
 						ATA_HORKAGE_NOLPM, },
 
+	/* This specific Samsung model/firmware-rev does not handle LPM well */
+	{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
+
 	/* devices that don't properly handle queued TRIM commands */
 	{ "Micron_M500_*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
 						ATA_HORKAGE_ZERO_AFTER_TRIM, },
-- 
2.17.0


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux