Yes, I am aware of the problems with WD Green and multiple partitions on single 4TB disk. I am in the middle of getting rid of old disks and I have enough new drives to stop having multiple partitions on single drives, but not enough power and free SATA ports. It is just a temporary solution. Also a reason why I did not include much details in the original post, I knew it would just distract from the problem I want to solve right away. What I need help with now is just getting the array started with the 16 out of 18 disks. Then I can continue migrating data and replacing old disks as planned. When I built the array in 2012, I used WD Green. They turned out to be horrible disks and I have since replaced some of them with WD Red. The newest disks I've bought are Ironwolves lør. 21. mar. 2020 kl. 01:06 skrev antlists <antlists@xxxxxxxxxxxxxxx>: > > On 20/03/2020 21:05, Glenn Greibesland wrote: > > fre. 20. mar. 2020 kl. 20:15 skrev Wols Lists <antlists@xxxxxxxxxxxxxxx>: > >> > >> On 19/03/20 19:55, Glenn Greibesland wrote: > >>> After a bit of digging in the manual and on different forums I have > >>> concluded that the next step for me is to recreate the array using > >>> –assume-clean and –data-offset=variable. > >>> I have tried a dry run of the command (answering no to “Continue > >>> creating array”), and mdadm accepts the parameters without any errors: > >> > >> Oh my god NO!!! > >> > >> Do NOT use --create unless someone rather more experienced than me tells > >> you to!!! > >> > >> The obvious thing is to somehow get the sixteen drives that you know > >> should be okay, re-assembled in a forced manner. The --re-add should not > >> have done any real damage because, as mdadm keeps complaining, you > >> didn't have enough drives so it won't have touched the data on that > >> drive. Unfortunately, my fu isn't good enough to tell you how to get > >> that drive back in. > >> > >> What's wrong with the two failed drives? Can you ddrescue them? They > >> might be enough to get you going again. > >> > >> You say you've read the web page "Raid recovery" - which says it's > >> obsolete and points you at "When things go wrogn" - but you don't appear > >> to have read that! PLEASE read "asking for help" and in particular you > >> NEED to run lsdrv and give us that information. Without that, if you DO > >> run --create, you will be in for a world of hurt. > >> > >> I know you may feel it's asking for loads of information, and the > >> resulting email will be massive, but trust me - the experts will look at > >> it and they will probably be able to come up with a plan of action. At > >> present, they don't have much to go on, and nor will you if carry on as > >> you're going ... > >> > >> Cheers, > >> Wol > > > > Thanks for replying to the thread. > > > > The two failed drives has "unreadable (pending) sectors", and they > > have a lower Event Count than the other disks, so that is why I've > > been trying to get the array up and running with the remaining 16 > > disks that has the same Event Count. > > > > I concluded myself that --create --assume-clean had to be the only > > thing left to try, that's why I didn't provide any logs or info. Sorry > > about that, you are right, I should check if there is any other > > options first. I've been trying to get this array up and running again > > for quite some time, so I'm all ears if someone has some magic to try. > > Yesterday I read some of the source code of mdadm and sort of answered > > my own question. According to the source code, specifying sizes in > > sectors is supported. I'd still like some confirmation though (talking > > about parse_size function in util.c). > > > > Here's some additional info: > > > > mdadm: added /dev/sdj1 to /dev/md/0 as 0 > > mdadm: added /dev/sdk1 to /dev/md/0 as 1 > > mdadm: added /dev/sdi1 to /dev/md/0 as 2 > > mdadm: added /dev/sdh1 to /dev/md/0 as 3 > > mdadm: added /dev/sdo1 to /dev/md/0 as 4 > > mdadm: added /dev/sdp1 to /dev/md/0 as 5 > > mdadm: added /dev/sdr1 to /dev/md/0 as 6 > > mdadm: added /dev/sdq1 to /dev/md/0 as 7 > > mdadm: added /dev/sdf1 to /dev/md/0 as 8 > > mdadm: added /dev/sdb1 to /dev/md/0 as 9 > > mdadm: added /dev/sdg1 to /dev/md/0 as -1 <<<< This is the drive > > that is now regarded as spare. It originally had slot 10 in the array > > mdadm: added /dev/sdd1 to /dev/md/0 as 11 > > mdadm: added /dev/sdm1 to /dev/md/0 as 12 > > mdadm: added /dev/sdf2 to /dev/md/0 as 13 > > mdadm: added /dev/sdc2 to /dev/md/0 as 16 > > mdadm: added /dev/sdc1 to /dev/md/0 as 17 > > > > > > > > mdadm: no uptodate device for slot 10 of /dev/md/0 << sdg1 > > mdadm: no uptodate device for slot 14 of /dev/md/0 << drive disconnected > > mdadm: no uptodate device for slot 15 of /dev/md/0 << drive disconnected > > > > mdadm: /dev/md/0 assembled from 15 drives and 1 spare - not enough to > > start the array. > > > > mdadm -D /dev/md0 > > /dev/md0: > > Version : 1.2 > > Raid Level : raid0 > > Total Devices : 16 > > Persistence : Superblock is persistent > > > > State : inactive > > Working Devices : 16 > > > > Name : vm-test:0 > > UUID : 45ced2f9:947773d4:106077ab:2df799d6 > > Events : 1937517 > > > > Number Major Minor RaidDevice > > > > - 8 17 - /dev/sdb1 > > - 8 33 - /dev/sdc1 > > - 8 34 - /dev/sdc2 > > What's this? Two partitions in the array on the same physical disk? > > > - 8 49 - /dev/sdd1 > > - 8 81 - /dev/sdf1 > > - 8 82 - /dev/sdf2 > > And again? > > > - 8 97 - /dev/sdg1 > > - 8 113 - /dev/sdh1 > > - 8 129 - /dev/sdi1 > > - 8 145 - /dev/sdj1 > > - 8 161 - /dev/sdk1 > > - 8 193 - /dev/sdm1 > > - 8 241 - /dev/sdp1 > > - 65 1 - /dev/sdq1 > > - 65 17 - /dev/sdr1 > > - 65 33 - /dev/sds1 > > > > > > > > SMART WRITE LOG does not return COUNT and LBA_LOW register > > SCT (Get) Error Recovery Control command failed > > Which disk is this? No error recovery? BAD sign ... > > > > Device Statistics (GP/SMART Log 0x04) not supported > > > > SATA Phy Event Counters (GP Log 0x11) > > ID Size Value Description > > 0x0001 2 0 Command failed due to ICRC error > > 0x0002 2 0 R_ERR response for data FIS > > 0x0003 2 0 R_ERR response for device-to-host data FIS > > 0x0004 2 0 R_ERR response for host-to-device data FIS > > 0x0005 2 0 R_ERR response for non-data FIS > > 0x0006 2 0 R_ERR response for device-to-host non-data FIS > > 0x0007 2 0 R_ERR response for host-to-device non-data FIS > > 0x0008 2 0 Device-to-host non-data FIS retries > > 0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy > > 0x000a 2 2 Device-to-host register FISes sent due to a COMRESET > > 0x000b 2 0 CRC errors within host-to-device FIS > > 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC > > 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC > > 0x8000 4 1208382 Vendor specific > > > > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build) > > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > > > > > > > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build) > > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > > > > > > > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build) > > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Green > > What's this? > > > Device Model: WDC WD20EARX-00PASB0 > > Serial Number: WD-WMAZA9538601 > > LU WWN Device Id: 5 0014ee 15a0a4ffa > > Firmware Version: 51.0AB51 > > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ATA8-ACS (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s) > > Local Time is: Fri Mar 20 21:00:38 2020 CET > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > AAM feature is: Unavailable > > APM feature is: Unavailable > > Rd look-ahead is: Enabled > > Write cache is: Enabled > > ATA Security is: Disabled, NOT FROZEN [SEC1] > > Wt Cache Reorder: Enabled > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > General SMART Values: > > Offline data collection status: (0x84) Offline data collection activity > > was suspended by an interrupting command from host. > > Auto Offline Data Collection: Enabled. > > Self-test execution status: ( 0) The previous self-test routine completed > > without error or no self-test has ever > > been run. > > Total time to complete Offline > > data collection: (37200) seconds. > > Offline data collection > > capabilities: (0x7b) SMART execute Offline immediate. > > Auto Offline data collection on/off support. > > Suspend Offline collection upon new > > command. > > Offline surface scan supported. > > Self-test supported. > > Conveyance Self-test supported. > > Selective Self-test supported. > > SMART capabilities: (0x0003) Saves SMART data before entering > > power-saving mode. > > Supports SMART auto save timer. > > Error logging capability: (0x01) Error logging supported. > > General Purpose Logging supported. > > Short self-test routine > > recommended polling time: ( 2) minutes. > > Extended self-test routine > > recommended polling time: ( 359) minutes. > > Conveyance self-test routine > > recommended polling time: ( 5) minutes. > > SCT capabilities: (0x3035) SCT Status supported. > > SCT Feature Control supported. > > SCT Data Table supported. > > No mention of ERC - Bad sign ... > > > > SMART Attributes Data Structure revision number: 16 > > Vendor Specific SMART Attributes with Thresholds: > > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > > 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 > > 3 Spin_Up_Time POS--K 171 171 021 - 6416 > > 4 Start_Stop_Count -O--CK 100 100 000 - 255 > > 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 > > 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 > > 9 Power_On_Hours -O--CK 098 098 000 - 1583 > > 10 Spin_Retry_Count -O--CK 100 100 000 - 0 > > 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 > > 12 Power_Cycle_Count -O--CK 100 100 000 - 131 > > 192 Power-Off_Retract_Count -O--CK 200 200 000 - 61 > > 193 Load_Cycle_Count -O--CK 191 191 000 - 29372 > > 194 Temperature_Celsius -O---K 122 101 000 - 28 > > 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 > > 197 Current_Pending_Sector -O--CK 200 200 000 - 0 > > 198 Offline_Uncorrectable ----CK 200 200 000 - 0 > > 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 > > 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0 > > ||||||_ K auto-keep > > |||||__ C event count > > ||||___ R error rate > > |||____ S speed/performance > > ||_____ O updated online > > |______ P prefailure warning > > > > General Purpose Log Directory Version 1 > > SMART Log Directory Version 1 [multi-sector log support] > > Address Access R/W Size Description > > 0x00 GPL,SL R/O 1 Log Directory > > 0x01 SL R/O 1 Summary SMART error log > > 0x02 SL R/O 5 Comprehensive SMART error log > > 0x03 GPL R/O 6 Ext. Comprehensive SMART error log > > 0x06 SL R/O 1 SMART self-test log > > 0x07 GPL R/O 1 Extended self-test log > > 0x09 SL R/W 1 Selective self-test log > > 0x10 GPL R/O 1 SATA NCQ Queued Error log > > 0x11 GPL R/O 1 SATA Phy Event Counters log > > 0x80-0x9f GPL,SL R/W 16 Host vendor specific log > > 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log > > 0xa8-0xb7 GPL,SL VS 1 Device vendor specific log > > 0xbd GPL,SL VS 1 Device vendor specific log > > 0xc0 GPL,SL VS 1 Device vendor specific log > > 0xc1 GPL VS 93 Device vendor specific log > > 0xe0 GPL,SL R/W 1 SCT Command/Status > > 0xe1 GPL,SL R/W 1 SCT Data Transfer > > > > SMART Extended Comprehensive Error Log Version: 1 (6 sectors) > > No Errors Logged > > > > SMART Extended Self-test Log Version: 1 (1 sectors) > > Num Test_Description Status Remaining > > LifeTime(hours) LBA_of_first_error > > # 1 Short offline Completed without error 00% 1245 - > > > > SMART Selective self-test log data structure revision number 1 > > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > > 1 0 0 Not_testing > > 2 0 0 Not_testing > > 3 0 0 Not_testing > > 4 0 0 Not_testing > > 5 0 0 Not_testing > > Selective self-test flags (0x0): > > After scanning selected spans, do NOT read-scan remainder of disk. > > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > > SCT Status Version: 3 > > SCT Version (vendor specific): 258 (0x0102) > > SCT Support Level: 1 > > Device State: Active (0) > > Current Temperature: 28 Celsius > > Power Cycle Min/Max Temperature: 8/43 Celsius > > Lifetime Min/Max Temperature: 0/49 Celsius > > Under/Over Temperature Limit Count: 0/0 > > > > SCT Temperature History Version: 2 > > Temperature Sampling Period: 1 minute > > Temperature Logging Interval: 1 minute > > Min/Max recommended Temperature: 0/60 Celsius > > Min/Max Temperature Limit: -41/85 Celsius > > Temperature History Size (Index): 478 (305) > > > > Index Estimated Time Temperature Celsius > > 306 2020-03-20 13:03 23 **** > > ... ..( 33 skipped). .. **** > > 340 2020-03-20 13:37 23 **** > > 341 2020-03-20 13:38 ? - > > 342 2020-03-20 13:39 23 **** > > 343 2020-03-20 13:40 23 **** > > 344 2020-03-20 13:41 24 ***** > > 345 2020-03-20 13:42 25 ****** > > 346 2020-03-20 13:43 25 ****** > > 347 2020-03-20 13:44 25 ****** > > 348 2020-03-20 13:45 26 ******* > > ... ..( 2 skipped). .. ******* > > 351 2020-03-20 13:48 26 ******* > > 352 2020-03-20 13:49 27 ******** > > 353 2020-03-20 13:50 27 ******** > > 354 2020-03-20 13:51 28 ********* > > 355 2020-03-20 13:52 28 ********* > > 356 2020-03-20 13:53 22 *** > > ... ..(276 skipped). .. *** > > 155 2020-03-20 18:30 22 *** > > 156 2020-03-20 18:31 23 **** > > ... ..(148 skipped). .. **** > > 305 2020-03-20 21:00 23 **** > > > > SCT Error Recovery Control command not supported > > Yup. Ouch! > > > > Device Statistics (GP/SMART Log 0x04) not supported > > > > SATA Phy Event Counters (GP Log 0x11) > > ID Size Value Description > > 0x0001 2 0 Command failed due to ICRC error > > 0x0002 2 0 R_ERR response for data FIS > > 0x0003 2 0 R_ERR response for device-to-host data FIS > > 0x0004 2 0 R_ERR response for host-to-device data FIS > > 0x0005 2 0 R_ERR response for non-data FIS > > 0x0006 2 0 R_ERR response for device-to-host non-data FIS > > 0x0007 2 0 R_ERR response for host-to-device non-data FIS > > 0x000a 2 5 Device-to-host register FISes sent due to a COMRESET > > 0x000b 2 0 CRC errors within host-to-device FIS > > 0x8000 4 1208379 Vendor specific > > > > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build) > > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Model Family: Western Digital Red > > Device Model: WDC WD20EFRX-68AX9N0 > > Serial Number: WD-WMC300320657 > > LU WWN Device Id: 5 0014ee 0ae1ee098 > > Firmware Version: 80.00A80 > > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Device is: In smartctl database [for details use: -P show] > > ATA Version is: ACS-2 (minor revision not indicated) > > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) > > Local Time is: Fri Mar 20 21:00:38 2020 CET > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > AAM feature is: Unavailable > > APM feature is: Unavailable > > Rd look-ahead is: Enabled > > Write cache is: Enabled > > ATA Security is: Disabled, NOT FROZEN [SEC1] > > Wt Cache Reorder: Unknown > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > General SMART Values: > > Offline data collection status: (0x00) Offline data collection activity > > was never started. > > Auto Offline Data Collection: Disabled. > > Self-test execution status: ( 0) The previous self-test routine completed > > without error or no self-test has ever > > been run. > > Total time to complete Offline > > data collection: (27120) seconds. > > Offline data collection > > capabilities: (0x7b) SMART execute Offline immediate. > > Auto Offline data collection on/off support. > > Suspend Offline collection upon new > > command. > > Offline surface scan supported. > > Self-test supported. > > Conveyance Self-test supported. > > Selective Self-test supported. > > SMART capabilities: (0x0003) Saves SMART data before entering > > power-saving mode. > > Supports SMART auto save timer. > > Error logging capability: (0x01) Error logging supported. > > General Purpose Logging supported. > > Short self-test routine > > recommended polling time: ( 2) minutes. > > Extended self-test routine > > recommended polling time: ( 274) minutes. > > Conveyance self-test routine > > recommended polling time: ( 5) minutes. > > SCT capabilities: (0x70bd) SCT Status supported. > > SCT Error Recovery Control supported. > > SCT Feature Control supported. > > SCT Data Table supported. > > > > SMART Attributes Data Structure revision number: 16 > > Vendor Specific SMART Attributes with Thresholds: > > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > > 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 > > 3 Spin_Up_Time POS--K 176 169 021 - 4183 > > 4 Start_Stop_Count -O--CK 100 100 000 - 502 > > 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 > > 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 > > 9 Power_On_Hours -O--CK 061 061 000 - 28588 > > 10 Spin_Retry_Count -O--CK 100 100 000 - 0 > > 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 > > 12 Power_Cycle_Count -O--CK 100 100 000 - 490 > > 192 Power-Off_Retract_Count -O--CK 200 200 000 - 483 > > 193 Load_Cycle_Count -O--CK 200 200 000 - 18 > > 194 Temperature_Celsius -O---K 120 089 000 - 27 > > 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 > > 197 Current_Pending_Sector -O--CK 200 200 000 - 0 > > 198 Offline_Uncorrectable ----CK 100 253 000 - 0 > > 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 > > 200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0 > > ||||||_ K auto-keep > > |||||__ C event count > > ||||___ R error rate > > |||____ S speed/performance > > ||_____ O updated online > > |______ P prefailure warning > > > > General Purpose Log Directory Version 1 > > SMART Log Directory Version 1 [multi-sector log support] > > Address Access R/W Size Description > > 0x00 GPL,SL R/O 1 Log Directory > > 0x01 SL R/O 1 Summary SMART error log > > 0x02 SL R/O 5 Comprehensive SMART error log > > 0x03 GPL R/O 6 Ext. Comprehensive SMART error log > > 0x06 SL R/O 1 SMART self-test log > > 0x07 GPL R/O 1 Extended self-test log > > 0x09 SL R/W 1 Selective self-test log > > 0x10 GPL R/O 1 SATA NCQ Queued Error log > > 0x11 GPL R/O 1 SATA Phy Event Counters log > > 0x21 GPL R/O 1 Write stream error log > > 0x22 GPL R/O 1 Read stream error log > > 0x80-0x9f GPL,SL R/W 16 Host vendor specific log > > 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log > > 0xa8-0xb7 GPL,SL VS 1 Device vendor specific log > > 0xbd GPL,SL VS 1 Device vendor specific log > > 0xc0 GPL,SL VS 1 Device vendor specific log > > 0xc1 GPL VS 93 Device vendor specific log > > 0xe0 GPL,SL R/W 1 SCT Command/Status > > 0xe1 GPL,SL R/W 1 SCT Data Transfer > > > > SMART Extended Comprehensive Error Log Version: 1 (6 sectors) > > No Errors Logged > > > > SMART Extended Self-test Log Version: 1 (1 sectors) > > Num Test_Description Status Remaining > > LifeTime(hours) LBA_of_first_error > > # 1 Short offline Completed without error 00% 26024 - > > > > SMART Selective self-test log data structure revision number 1 > > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > > 1 0 0 Not_testing > > 2 0 0 Not_testing > > 3 0 0 Not_testing > > 4 0 0 Not_testing > > 5 0 0 Not_testing > > Selective self-test flags (0x0): > > After scanning selected spans, do NOT read-scan remainder of disk. > > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > > SCT Status Version: 3 > > SCT Version (vendor specific): 258 (0x0102) > > SCT Support Level: 1 > > Device State: Active (0) > > Current Temperature: 27 Celsius > > Power Cycle Min/Max Temperature: 10/32 Celsius > > Lifetime Min/Max Temperature: 2/58 Celsius > > Under/Over Temperature Limit Count: 0/0 > > Vendor specific: > > 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > SCT Temperature History Version: 2 > > Temperature Sampling Period: 1 minute > > Temperature Logging Interval: 1 minute > > Min/Max recommended Temperature: 0/60 Celsius > > Min/Max Temperature Limit: -41/85 Celsius > > Temperature History Size (Index): 478 (56) > > > > Index Estimated Time Temperature Celsius > > 57 2020-03-20 13:03 24 ***** > > ... ..(377 skipped). .. ***** > > 435 2020-03-20 19:21 24 ***** > > 436 2020-03-20 19:22 ? - > > 437 2020-03-20 19:23 24 ***** > > 438 2020-03-20 19:24 25 ****** > > ... ..( 3 skipped). .. ****** > > 442 2020-03-20 19:28 25 ****** > > 443 2020-03-20 19:29 26 ******* > > 444 2020-03-20 19:30 26 ******* > > 445 2020-03-20 19:31 26 ******* > > 446 2020-03-20 19:32 27 ******** > > ... ..( 3 skipped). .. ******** > > 450 2020-03-20 19:36 27 ******** > > 451 2020-03-20 19:37 24 ***** > > ... ..( 82 skipped). .. ***** > > 56 2020-03-20 21:00 24 ***** > > > > SCT Error Recovery Control: > > Read: Disabled > > Write: Disabled > > What's going on here? We have a RED drive, but ERC isn't working ... > > > > Device Statistics (GP/SMART Log 0x04) not supported > > > > SATA Phy Event Counters (GP Log 0x11) > > ID Size Value Description > > 0x0001 2 0 Command failed due to ICRC error > > 0x0002 2 0 R_ERR response for data FIS > > 0x0003 2 0 R_ERR response for device-to-host data FIS > > 0x0004 2 0 R_ERR response for host-to-device data FIS > > 0x0005 2 0 R_ERR response for non-data FIS > > 0x0006 2 0 R_ERR response for device-to-host non-data FIS > > 0x0007 2 0 R_ERR response for host-to-device non-data FIS > > 0x0008 2 0 Device-to-host non-data FIS retries > > 0x0009 2 33 Transition from drive PhyRdy to drive PhyNRdy > > 0x000a 2 34 Device-to-host register FISes sent due to a COMRESET > > 0x000b 2 0 CRC errors within host-to-device FIS > > 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC > > 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC > > 0x8000 4 1208361 Vendor specific > > > > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build) > > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > > > === START OF INFORMATION SECTION === > > Device Model: ST4000VN008-2DR166 > > Serial Number: ZDH82183 > > LU WWN Device Id: 5 000c50 0c37c42c0 > > Firmware Version: SC60 > > User Capacity: 4,000,787,030,016 bytes [4.00 TB] > > Sector Sizes: 512 bytes logical, 4096 bytes physical > > Rotation Rate: 5980 rpm > > Form Factor: 3.5 inches > > Device is: Not in smartctl database [for details use: -P showall] > > ATA Version is: ACS-3 T13/2161-D revision 5 > > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) > > Local Time is: Fri Mar 20 21:00:38 2020 CET > > SMART support is: Available - device has SMART capability. > > SMART support is: Enabled > > AAM feature is: Unavailable > > APM level is: 254 (maximum performance) > > Rd look-ahead is: Enabled > > Write cache is: Enabled > > ATA Security is: Disabled, NOT FROZEN [SEC1] > > Wt Cache Reorder: Unknown > > > > === START OF READ SMART DATA SECTION === > > SMART overall-health self-assessment test result: PASSED > > > > General SMART Values: > > Offline data collection status: (0x82) Offline data collection activity > > was completed without error. > > Auto Offline Data Collection: Enabled. > > Self-test execution status: ( 0) The previous self-test routine completed > > without error or no self-test has ever > > been run. > > Total time to complete Offline > > data collection: ( 581) seconds. > > Offline data collection > > capabilities: (0x7b) SMART execute Offline immediate. > > Auto Offline data collection on/off support. > > Suspend Offline collection upon new > > command. > > Offline surface scan supported. > > Self-test supported. > > Conveyance Self-test supported. > > Selective Self-test supported. > > SMART capabilities: (0x0003) Saves SMART data before entering > > power-saving mode. > > Supports SMART auto save timer. > > Error logging capability: (0x01) Error logging supported. > > General Purpose Logging supported. > > Short self-test routine > > recommended polling time: ( 1) minutes. > > Extended self-test routine > > recommended polling time: ( 621) minutes. > > Conveyance self-test routine > > recommended polling time: ( 2) minutes. > > SCT capabilities: (0x50bd) SCT Status supported. > > SCT Error Recovery Control supported. > > SCT Feature Control supported. > > SCT Data Table supported. > > > > SMART Attributes Data Structure revision number: 10 > > Vendor Specific SMART Attributes with Thresholds: > > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > > 1 Raw_Read_Error_Rate POSR-- 070 065 044 - 10856451 > > 3 Spin_Up_Time PO---- 094 094 000 - 0 > > 4 Start_Stop_Count -O--CK 100 100 020 - 53 > > 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 > > 7 Seek_Error_Rate POSR-- 075 061 045 - 29667756 > > 9 Power_On_Hours -O--CK 100 100 000 - 506 (130 79 0) > > 10 Spin_Retry_Count PO--C- 100 100 097 - 0 > > 12 Power_Cycle_Count -O--CK 100 100 020 - 5 > > 184 End-to-End_Error -O--CK 100 100 099 - 0 > > 187 Reported_Uncorrect -O--CK 100 100 000 - 0 > > 188 Command_Timeout -O--CK 098 098 000 - 65538 > > 189 High_Fly_Writes -O-RCK 100 100 000 - 0 > > 190 Airflow_Temperature_Cel -O---K 076 070 040 - 24 (Min/Max 9/26) > > 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 > > 192 Power-Off_Retract_Count -O--CK 100 100 000 - 44 > > 193 Load_Cycle_Count -O--CK 100 100 000 - 284 > > 194 Temperature_Celsius -O---K 024 040 000 - 24 (0 9 0 0 0) > > 197 Current_Pending_Sector -O--C- 100 100 000 - 0 > > 198 Offline_Uncorrectable ----C- 100 100 000 - 0 > > 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 > > 240 Head_Flying_Hours ------ 100 253 000 - 139 (51 45 0) > > 241 Total_LBAs_Written ------ 100 253 000 - 8177237744 > > 242 Total_LBAs_Read ------ 100 253 000 - 5818370819 > > ||||||_ K auto-keep > > |||||__ C event count > > ||||___ R error rate > > |||____ S speed/performance > > ||_____ O updated online > > |______ P prefailure warning > > > > General Purpose Log Directory Version 1 > > SMART Log Directory Version 1 [multi-sector log support] > > Address Access R/W Size Description > > 0x00 GPL,SL R/O 1 Log Directory > > 0x01 SL R/O 1 Summary SMART error log > > 0x02 SL R/O 5 Comprehensive SMART error log > > 0x03 GPL R/O 5 Ext. Comprehensive SMART error log > > 0x04 GPL,SL R/O 8 Device Statistics log > > 0x06 SL R/O 1 SMART self-test log > > 0x07 GPL R/O 1 Extended self-test log > > 0x09 SL R/W 1 Selective self-test log > > 0x10 GPL R/O 1 SATA NCQ Queued Error log > > 0x11 GPL R/O 1 SATA Phy Event Counters log > > 0x13 GPL R/O 1 SATA NCQ Send and Receive log > > 0x15 GPL R/W 1 SATA Rebuild Assist log > > 0x21 GPL R/O 1 Write stream error log > > 0x22 GPL R/O 1 Read stream error log > > 0x24 GPL R/O 512 Current Device Internal Status Data log > > 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log > > 0x80-0x9f GPL,SL R/W 16 Host vendor specific log > > 0xa1 GPL,SL VS 24 Device vendor specific log > > 0xa2 GPL VS 8160 Device vendor specific log > > 0xa6 GPL VS 192 Device vendor specific log > > 0xa8-0xa9 GPL,SL VS 136 Device vendor specific log > > 0xab GPL VS 1 Device vendor specific log > > 0xb0 GPL VS 9048 Device vendor specific log > > 0xbe-0xbf GPL VS 65535 Device vendor specific log > > 0xc1 GPL,SL VS 16 Device vendor specific log > > 0xd1 GPL VS 136 Device vendor specific log > > 0xd2 GPL VS 10000 Device vendor specific log > > 0xd3 GPL VS 1920 Device vendor specific log > > 0xe0 GPL,SL R/W 1 SCT Command/Status > > 0xe1 GPL,SL R/W 1 SCT Data Transfer > > > > SMART Extended Comprehensive Error Log Version: 1 (5 sectors) > > No Errors Logged > > > > SMART Extended Self-test Log Version: 1 (1 sectors) > > No self-tests have been logged. [To run self-tests, use: smartctl -t] > > > > SMART Selective self-test log data structure revision number 1 > > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > > 1 0 0 Not_testing > > 2 0 0 Not_testing > > 3 0 0 Not_testing > > 4 0 0 Not_testing > > 5 0 0 Not_testing > > Selective self-test flags (0x0): > > After scanning selected spans, do NOT read-scan remainder of disk. > > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > > SCT Status Version: 3 > > SCT Version (vendor specific): 522 (0x020a) > > SCT Support Level: 1 > > Device State: Active (0) > > Current Temperature: 23 Celsius > > Power Cycle Min/Max Temperature: 8/26 Celsius > > Lifetime Min/Max Temperature: 8/30 Celsius > > Under/Over Temperature Limit Count: 0/336 > > > > SCT Temperature History Version: 2 > > Temperature Sampling Period: 3 minutes > > Temperature Logging Interval: 59 minutes > > Min/Max recommended Temperature: 0/ 0 Celsius > > Min/Max Temperature Limit: 0/ 0 Celsius > > Temperature History Size (Index): 128 (119) > > > > Index Estimated Time Temperature Celsius > > 120 2020-03-15 16:02 21 ** > > ... ..( 5 skipped). .. ** > > 126 2020-03-15 21:56 21 ** > > 127 2020-03-15 22:55 22 *** > > ... ..( 16 skipped). .. *** > > 16 2020-03-16 15:38 22 *** > > 17 2020-03-16 16:37 23 **** > > ... ..( 3 skipped). .. **** > > 21 2020-03-16 20:33 23 **** > > 22 2020-03-16 21:32 24 ***** > > 23 2020-03-16 22:31 23 **** > > 24 2020-03-16 23:30 24 ***** > > 25 2020-03-17 00:29 24 ***** > > 26 2020-03-17 01:28 24 ***** > > 27 2020-03-17 02:27 23 **** > > ... ..( 7 skipped). .. **** > > 35 2020-03-17 10:19 23 **** > > 36 2020-03-17 11:18 22 *** > > ... ..( 3 skipped). .. *** > > 40 2020-03-17 15:14 22 *** > > 41 2020-03-17 16:13 23 **** > > ... ..( 14 skipped). .. **** > > 56 2020-03-18 06:58 23 **** > > 57 2020-03-18 07:57 22 *** > > ... ..( 2 skipped). .. *** > > 60 2020-03-18 10:54 22 *** > > 61 2020-03-18 11:53 21 ** > > 62 2020-03-18 12:52 20 * > > 63 2020-03-18 13:51 21 ** > > 64 2020-03-18 14:50 20 * > > 65 2020-03-18 15:49 20 * > > 66 2020-03-18 16:48 21 ** > > ... ..( 5 skipped). .. ** > > 72 2020-03-18 22:42 21 ** > > 73 2020-03-18 23:41 24 ***** > > 74 2020-03-19 00:40 26 ******* > > ... ..( 2 skipped). .. ******* > > 77 2020-03-19 03:37 26 ******* > > 78 2020-03-19 04:36 22 *** > > ... ..( 2 skipped). .. *** > > 81 2020-03-19 07:33 22 *** > > 82 2020-03-19 08:32 21 ** > > 83 2020-03-19 09:31 22 *** > > 84 2020-03-19 10:30 22 *** > > 85 2020-03-19 11:29 21 ** > > ... ..( 2 skipped). .. ** > > 88 2020-03-19 14:26 21 ** > > 89 2020-03-19 15:25 25 ****** > > 90 2020-03-19 16:24 25 ****** > > 91 2020-03-19 17:23 26 ******* > > 92 2020-03-19 18:22 25 ****** > > 93 2020-03-19 19:21 22 *** > > ... ..( 3 skipped). .. *** > > 97 2020-03-19 23:17 22 *** > > 98 2020-03-20 00:16 21 ** > > ... ..( 4 skipped). .. ** > > 103 2020-03-20 05:11 21 ** > > 104 2020-03-20 06:10 20 * > > ... ..( 11 skipped). .. * > > 116 2020-03-20 17:58 20 * > > 117 2020-03-20 18:57 21 ** > > 118 2020-03-20 19:56 21 ** > > 119 2020-03-20 20:55 21 ** > > > > SCT Error Recovery Control: > > Read: Disabled > > Write: Disabled > > OUCH! AGAIN! > > > > Device Statistics (GP Log 0x04) > > Page Offset Size Value Flags Description > > 0x01 ===== = = === == General Statistics (rev 1) == > > 0x01 0x008 4 5 --- Lifetime Power-On Resets > > 0x01 0x010 4 506 --- Power-on Hours > > 0x01 0x018 6 8177237744 --- Logical Sectors Written > > 0x01 0x020 6 32254131 --- Number of Write Commands > > 0x01 0x028 6 5818370805 --- Logical Sectors Read > > 0x01 0x030 6 24397122 --- Number of Read Commands > > 0x01 0x038 6 - --- Date and Time TimeStamp > > 0x03 ===== = = === == Rotating Media Statistics (rev 1) == > > 0x03 0x008 4 159 --- Spindle Motor Power-on Hours > > 0x03 0x010 4 10 --- Head Flying Hours > > 0x03 0x018 4 284 --- Head Load Events > > 0x03 0x020 4 0 --- Number of Reallocated Logical Sectors > > 0x03 0x028 4 0 --- Read Recovery Attempts > > 0x03 0x030 4 0 --- Number of Mechanical Start Failures > > 0x03 0x038 4 0 --- Number of Realloc. Candidate > > Logical Sectors > > 0x03 0x040 4 45 --- Number of High Priority Unload Events > > 0x04 ===== = = === == General Errors Statistics (rev 1) == > > 0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors > > 0x04 0x010 4 2 --- Resets Between Cmd Acceptance and > > Completion > > 0x05 ===== = = === == Temperature Statistics (rev 1) == > > 0x05 0x008 1 23 --- Current Temperature > > 0x05 0x010 1 20 --- Average Short Term Temperature > > 0x05 0x018 1 - --- Average Long Term Temperature > > 0x05 0x020 1 30 --- Highest Temperature > > 0x05 0x028 1 0 --- Lowest Temperature > > 0x05 0x030 1 27 --- Highest Average Short Term Temperature > > 0x05 0x038 1 14 --- Lowest Average Short Term Temperature > > 0x05 0x040 1 - --- Highest Average Long Term Temperature > > 0x05 0x048 1 - --- Lowest Average Long Term Temperature > > 0x05 0x050 4 0 --- Time in Over-Temperature > > 0x05 0x058 1 70 --- Specified Maximum Operating Temperature > > 0x05 0x060 4 0 --- Time in Under-Temperature > > 0x05 0x068 1 0 --- Specified Minimum Operating Temperature > > 0x06 ===== = = === == Transport Statistics (rev 1) == > > 0x06 0x008 4 101 --- Number of Hardware Resets > > 0x06 0x010 4 17 --- Number of ASR Events > > 0x06 0x018 4 0 --- Number of Interface CRC Errors > > |||_ C monitored condition met > > ||__ D supports DSN > > |___ N normalized value > > > > SATA Phy Event Counters (GP Log 0x11) > > ID Size Value Description > > 0x000a 2 34 Device-to-host register FISes sent due to a COMRESET > > 0x0001 2 0 Command failed due to ICRC error > > 0x0003 2 0 R_ERR response for device-to-host data FIS > > 0x0004 2 0 R_ERR response for host-to-device data FIS > > 0x0006 2 0 R_ERR response for device-to-host non-data FIS > > 0x0007 2 0 R_ERR response for host-to-device non-data FIS > > > > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-64-generic] (local build) > > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > > Oh My God. > > This array is just asking for disaster. Whoops, you've just had one, sorry. > > I'm looking for details of your two failed drives, but I don't seem able > to find any. But as soon as you can get the array back, you need to fix > those problems ASAP!!! > > Firstly, get rid of that Green!!! Were the two failed drives greens? > Read the timeout page to find out why. > > https://raid.wiki.kernel.org/index.php/Timeout_Mismatch > > That will hopefully also fix the problem with those Reds with ERC > disabled. It would not surprise me in the slightest if this is what has > done the damage to your array. > > Lastly, those ST4000s. Are they Ironwolves? I guess they're good drives, > but they've just trashed your raid-6 redundancy - lose just one of them > and your array is teetering on the edge. You need to get your sdx2 > partitions copied on to new drives ASAP. > > What I'd do is get a couple more ST4000s, and use them, creating 4GB > partitions. Then take your existing ST4000s, and convert them to 4GB > partitions. At which point you only need five more ST4000s to move your > array on to new drives. > > I'm not sure how you get there - once you've got your 9 4GB drives you > *may* be able to just fail and remove the remaining 2GB drives. > Otherwise, I'd use the freed-up 2GB drives to create 4GB raid-0s. You'd > end up having to buy a couple of spare 4GB drives to move the entire > array on to 4GB "drives", but then you could remove the raid-0 arrays. > > Cheers, > Wol