Re: btrfs raid 10 fileserver with ata errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

> after some googling it's been suggested that it's either a hard drive,
> the sata controller or the sata cables.
> how do i go about diagnosing and fixing the problem,
> any suggestions or guidance would be appreciated.
> shadrock
> I've had this problem before. IIRC, you can match up the ata17.00 with what drive it's talking about by looking at your kernel boot messages. The first thing I would do is switch out the SATA cable and see if the problem persists. If that doesn't work, run a scan of the drive using the manufacturers scan program.

hi everyone

these are the following tests i've tried and the results

journalctl -f | grep ata
Jan 13 12:37:13 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:37:13 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:37:13 maybel kernel: ata17.00: cmd
25/00:00:00:24:7d/00:07:1a:00:00/e0 tag 13 dma 917504 in
Jan 13 12:37:13 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:37:13 maybel kernel: ata17: hard resetting link
Jan 13 12:37:13 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:37:13 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:37:13 maybel kernel: ata17: EH complete
Jan 13 12:37:45 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:37:45 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:37:45 maybel kernel: ata17.00: cmd
25/00:00:00:d9:7d/00:07:1a:00:00/e0 tag 25 dma 917504 in
Jan 13 12:37:45 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:37:45 maybel kernel: ata17: hard resetting link
Jan 13 12:37:46 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:37:46 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:37:46 maybel kernel: ata17: EH complete
Jan 13 12:38:19 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:38:19 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:38:19 maybel kernel: ata17.00: cmd
25/00:00:80:d6:81/00:06:1a:00:00/e0 tag 1 dma 786432 in
Jan 13 12:38:19 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:38:19 maybel kernel: ata17: hard resetting link
Jan 13 12:38:20 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:38:20 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:38:20 maybel kernel: ata17: EH complete
Jan 13 12:38:52 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:38:52 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:38:52 maybel kernel: ata17.00: cmd
25/00:80:80:d1:82/00:05:1a:00:00/e0 tag 28 dma 720896 in
Jan 13 12:38:52 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:38:52 maybel kernel: ata17: hard resetting link
Jan 13 12:38:52 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:38:52 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:38:52 maybel kernel: ata17: EH complete
Jan 13 12:39:24 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:39:24 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:39:24 maybel kernel: ata17.00: cmd
25/00:00:00:9d:84/00:05:1a:00:00/e0 tag 1 dma 655360 in
Jan 13 12:39:24 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:39:24 maybel kernel: ata17: hard resetting link
Jan 13 12:39:25 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:39:25 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:39:25 maybel kernel: ata17: EH complete
Jan 13 12:39:57 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:39:57 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:39:57 maybel kernel: ata17.00: cmd
25/00:00:80:b8:85/00:05:1a:00:00/e0 tag 6 dma 655360 in
Jan 13 12:39:57 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:39:57 maybel kernel: ata17: hard resetting link
Jan 13 12:39:57 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:39:57 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:39:57 maybel kernel: ata17: EH complete
Jan 13 12:40:29 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:40:29 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:40:29 maybel kernel: ata17.00: cmd
25/00:80:80:cd:85/00:05:1a:00:00/e0 tag 16 dma 720896 in
Jan 13 12:40:29 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:40:29 maybel kernel: ata17: hard resetting link
Jan 13 12:40:30 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:40:30 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:40:30 maybel kernel: ata17: EH complete
Jan 13 12:41:02 maybel kernel: ata17.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x6 frozen
Jan 13 12:41:02 maybel kernel: ata17.00: failed command: READ DMA EXT
Jan 13 12:41:02 maybel kernel: ata17.00: cmd
25/00:80:80:f2:85/00:05:1a:00:00/e0 tag 16 dma 720896 in
Jan 13 12:41:02 maybel kernel: ata17.00: status: { DRDY }
Jan 13 12:41:02 maybel kernel: ata17: hard resetting link
Jan 13 12:41:02 maybel kernel: ata17: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 13 12:41:02 maybel kernel: ata17.00: configured for UDMA/33
Jan 13 12:41:02 maybel kernel: ata17: EH complete

[alarm@maybel ~]$ ls -l /sys/block/ | grep sd.
lrwxrwxrwx 1 root root 0 Jan  9 14:21 sda ->
lrwxrwxrwx 1 root root 0 Jan  9 14:26 sdb ->
lrwxrwxrwx 1 root root 0 Jan  9 14:26 sdc ->
lrwxrwxrwx 1 root root 0 Jan  9 14:26 sdd ->
lrwxrwxrwx 1 root root 0 Jan  9 14:26 sde ->
lrwxrwxrwx 1 root root 0 Jan  9 14:26 sdf ->
lrwxrwxrwx 1 root root 0 Jan  9 14:26 sdg ->
lrwxrwxrwx 1 root root 0 Jan 10 02:43 sdh ->

sudo smartctl -i /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,

Device Model:     HITACHI HUA722010ALA330
Serial Number:    N136GXML
LU WWN Device Id: 5 000cca 39ced38c2
Firmware Version: JP4ONA00
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 1.5 Gb/s
Local Time is:    Fri Jan 13 12:59:51 2017 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

sudo smartctl -t short /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,

Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Fri Jan 13 13:31:48 2017

Use smartctl -X to abort test.

sudo smartctl -a /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,

Device Model:     HITACHI HUA722010ALA330
Serial Number:    N136GXML
LU WWN Device Id: 5 000cca 39ced38c2
Firmware Version: JP4ONA00
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 1.5 Gb/s
Local Time is:    Fri Jan 13 13:34:00 2017 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)    Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  41)    The self-test routine was
                    by the host with a hard or soft reset.
Total time to complete Offline
data collection:         ( 9929) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 166) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail 
Always       -       0
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail 
Offline      -       91
  3 Spin_Up_Time                 0x0007   130   130   024    Pre-fail 
Always       -       278 (Average 305)
  4 Start_Stop_Count            0x0012   100   100   000    Old_age  
Always       -       69
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail 
Always       -       0
  7 Seek_Error_Rate            0x000b   100   100   067    Pre-fail 
Always       -       0
  8 Seek_Time_Performance   0x0005   138   138   020    Pre-fail 
Offline      -       31
  9 Power_On_Hours          0x0012   099   099   000    Old_age  
Always       -       10782
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail 
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age  
Always       -       68
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age  
Always       -       123
193 Load_Cycle_Count        0x0012   100   100   000    Old_age  
Always       -       123
194 Temperature_Celsius     0x0002   193   193   000    Old_age  
Always       -       31 (Min/Max 12/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age  
Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age  
Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age  
Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age  
Always       -       0

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Interrupted (host reset)      90%    
10782         -
# 2  Short offline       Interrupted (host reset)      90%    
10776         -
# 3  Short offline       Interrupted (host reset)      90%    
10752         -
# 4  Short offline       Interrupted (host reset)      90%    
10728         -
# 5  Short offline       Completed without error       00%    
10703         -
# 6  Short offline       Interrupted (host reset)      90%    
10656         -
# 7  Short offline       Interrupted (host reset)      90%    
10632         -
# 8  Extended offline    Interrupted (host reset)      90%    
10628         -
# 9  Short offline       Interrupted (host reset)      90%    
10608         -
#10  Short offline       Interrupted (host reset)      90%    
10584         -
#11  Short offline       Interrupted (host reset)      90%    
10560         -
#12  Short offline       Interrupted (host reset)      90%    
10537         -
#13  Short offline       Interrupted (host reset)      90%    
10513         -
#14  Short offline       Interrupted (host reset)      90%    
10489         -
#15  Short offline       Interrupted (host reset)      90%    
10465         -
#16  Extended offline    Interrupted (host reset)      90%    
10461         -
#17  Short offline       Interrupted (host reset)      90%    
10441         -
#18  Short offline       Interrupted (host reset)      90%    
10417         -
#19  Short offline       Interrupted (host reset)      90%    
10393         -
#20  Short offline       Completed without error       00%    
10368         -
#21  Short offline       Completed without error       00%    
10344         -

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

lspci | grep SATA
00:05.0 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: NVIDIA Corporation MCP55 SATA Controller (rev a3)
02:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11)
03:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
03:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
04:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)
04:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE
Controller (rev 03)

sudo lshw -c storage
       description: Mass storage device
       product: AS2105
       vendor: ASMedia
       physical id: 6
       bus info: usb@1:6
       version: 0.01
       serial: WD-WCC4N5AHPU2D
       capabilities: usb-2.10 scsi
       configuration: driver=usb-storage speed=480Mbit/s
       description: IDE interface
       product: MCP55 IDE
       vendor: NVIDIA Corporation
       physical id: 7
       bus info: pci@0000:00:04.0
       version: a1
       width: 32 bits
       clock: 66MHz
       capabilities: ide pm bus_master cap_list
       configuration: driver=pata_amd latency=0 maxlatency=1 mingnt=3
       resources: irq:0 ioport:1f0(size=8) ioport:3f6 ioport:170(size=8)
ioport:376 ioport:f000(size=16)
       description: IDE interface
       product: MCP55 SATA Controller
       vendor: NVIDIA Corporation
       physical id: 5
       bus info: pci@0000:00:05.0
       version: a3
       width: 32 bits
       clock: 66MHz
       capabilities: ide pm msi ht bus_master cap_list
       configuration: driver=sata_nv latency=0 maxlatency=1 mingnt=3
       resources: irq:21 ioport:9f0(size=8) ioport:bf0(size=4)
ioport:970(size=8) ioport:b70(size=4) ioport:dc00(size=16)
       description: IDE interface
       product: MCP55 SATA Controller
       vendor: NVIDIA Corporation
       physical id: 5.1
       bus info: pci@0000:00:05.1
       version: a3
       width: 32 bits
       clock: 66MHz
       capabilities: ide pm msi ht bus_master cap_list
       configuration: driver=sata_nv latency=0 maxlatency=1 mingnt=3
       resources: irq:20 ioport:9e0(size=8) ioport:be0(size=4)
ioport:960(size=8) ioport:b60(size=4) ioport:c800(size=16)
       description: IDE interface
       product: MCP55 SATA Controller
       vendor: NVIDIA Corporation
       physical id: 5.2
       bus info: pci@0000:00:05.2
       version: a3
       width: 32 bits
       clock: 66MHz
       capabilities: ide pm msi ht bus_master cap_list
       configuration: driver=sata_nv latency=0 maxlatency=1 mingnt=3
       resources: irq:23 ioport:c400(size=8) ioport:c000(size=4)
ioport:bc00(size=8) ioport:b800(size=4) ioport:b400(size=16)
       description: SATA controller
       product: Marvell Technology Group Ltd.
       vendor: Marvell Technology Group Ltd.
       physical id: 0
       bus info: pci@0000:02:00.0
       version: 11
       width: 32 bits
       clock: 33MHz
       capabilities: storage pm msi pciexpress ahci_1.0 bus_master
cap_list rom
       configuration: driver=ahci latency=0
       resources: irq:27 ioport:9c00(size=8) ioport:9800(size=4)
ioport:9400(size=8) ioport:9000(size=4) ioport:8c00(size=32)
memory:fdeff000-fdeff7ff memory:fdee0000-fdeeffff
       description: SATA controller
       product: JMB363 SATA/IDE Controller
       vendor: JMicron Technology Corp.
       physical id: 0
       bus info: pci@0000:03:00.0
       version: 03
       width: 32 bits
       clock: 33MHz
       capabilities: storage pm pciexpress ahci_1.0 bus_master cap_list rom
       configuration: driver=ahci latency=0
       resources: irq:16 memory:fddfe000-fddfffff memory:fdde0000-fddeffff
       description: IDE interface
       product: JMB363 SATA/IDE Controller
       vendor: JMicron Technology Corp.
       physical id: 0.1
       bus info: pci@0000:03:00.1
       version: 03
       width: 32 bits
       clock: 33MHz
       capabilities: ide pm bus_master cap_list
       configuration: driver=pata_jmicron latency=0
       resources: irq:16 ioport:7c00(size=8) ioport:7800(size=4)
ioport:7400(size=8) ioport:7000(size=4) ioport:6c00(size=16)
       description: SATA controller
       product: JMB363 SATA/IDE Controller
       vendor: JMicron Technology Corp.
       physical id: 0
       bus info: pci@0000:04:00.0
       version: 03
       width: 32 bits
       clock: 33MHz
       capabilities: storage pm pciexpress ahci_1.0 bus_master cap_list rom
       configuration: driver=ahci latency=0
       resources: irq:16 memory:fdcfe000-fdcfffff memory:fdce0000-fdceffff
       description: IDE interface
       product: JMB363 SATA/IDE Controller
       vendor: JMicron Technology Corp.
       physical id: 0.1
       bus info: pci@0000:04:00.1
       version: 03
       width: 32 bits
       clock: 33MHz
       capabilities: ide pm bus_master cap_list
       configuration: driver=pata_jmicron latency=0
       resources: irq:16 ioport:5c00(size=8) ioport:5800(size=4)
ioport:5400(size=8) ioport:5000(size=4) ioport:4c00(size=16)
       physical id: 1
       bus info: scsi@20
       logical name: scsi20
       capabilities: scsi-host
       configuration: driver=usb-storage

/dev/sdf is connected to ata17 on the 03.00 controller has the problems
/dev/sdg  connected to ata18 on the same controller is fine
/dev/sdf exibits a long delay when getting the report from smartctl -a 
and frequent interurpted smart tests
i will try a new cable later and report back.

[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux