Re: hung grow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just realized I forgot to do a reply all on this yesterday.  In case
anyone else in the group is interested. The ddrescue on the failed
drive is complete now.

Requested info pasted below, at least the info I got.

(Side note: Let's just go with stupidity)

Then depending on what your feedback is, theoretically I'll want to run

mdadm --assemble --force --update=revert-reshape /dev/md127 /dev/sda1
/dev/sdc1 /dev/sdd1 /dev/sdg1/ /dev/ddRescuePart

I appreciate all your help on this.

Cheers,
Curt

On Wed, Oct 4, 2017 at 5:53 PM, Phil Turmel <philip@xxxxxxxxxx> wrote:
> Hi Curt,
>
> Let me endorse Wol's prescription, with a few comments:
>
> On 10/04/2017 05:08 PM, Anthony Youngman wrote:
>> On 04/10/17 21:01, Curt wrote:
>
> { Side note: what possessed you to do a grow operation? }
>
>>> I'll be doing a ddrescue on the drives tonight, but will wait till
>>> Phil or someone chimes in with my next steps after I do that.
>
> I haven't seen complete mdadm -E reports for all of these devices, nor
> mdadm -D for the array itself.  Please do so now.  If you have any of
> that from before the crash, please post that too.  Run mdadm -E on the
> two earliest failed drives.
>
Don't hate on me too bad.  I already know I made several very stupid
mistakes along the way.

Here's watch I got probably missing a few things that would be useful.
The array is currently stopped, so I can't get you the -D, but here's
what I got

Array Before Grow
/dev/md127:
           Version : 0.90
     Creation Time : Fri Jun 15 15:52:05 2012
        Raid Level : raid6
        Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
     Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
      Raid Devices : 7
     Total Devices : 7
   Preferred Minor : 127
       Persistence : Superblock is persistent

       Update Time : Tue Oct  3 21:13:32 2017
             State : clean, degraded, recovering
    Active Devices : 5
   Working Devices : 7
    Failed Devices : 0
     Spare Devices : 2

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

    Rebuild Status : 84% complete

              UUID : 714a612d:9bd35197:36c91ae3:c168144d
            Events : 0.11559596

    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       1       8       49        1      active sync   /dev/sdd1
       2       8       33        2      active sync   /dev/sdc1
       3       8        1        3      active sync   /dev/sda1
       4       8       65        4      active sync   /dev/sde1
       8       8       16        5      spare rebuilding   /dev/sdb
       7       8       80        6      spare rebuilding   /dev/sdf


Array After Grow:
mdadm --detail /dev/md127
/dev/md127:
           Version : 0.91
     Creation Time : Fri Jun 15 15:52:05 2012
        Raid Level : raid6
        Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
     Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
      Raid Devices : 8
     Total Devices : 7
   Preferred Minor : 127
       Persistence : Superblock is persistent

       Update Time : Tue Oct  3 23:10:32 2017
             State : clean, FAILED, reshaping
    Active Devices : 5
   Working Devices : 7
    Failed Devices : 0
     Spare Devices : 2

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

    Reshape Status : 0% complete
     Delta Devices : 1, (7->8)

              UUID : 714a612d:9bd35197:36c91ae3:c168144d
            Events : 0.11559671

    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       1       8       49        1      active sync   /dev/sdd1
       2       8       33        2      active sync   /dev/sdc1
       3       8        1        3      active sync   /dev/sda1
       4       8       65        4      active sync   /dev/sde1
       5       8       16        5      spare rebuilding   /dev/sdb
       6       8       80        6      spare rebuilding   /dev/sdf
       -       0        0        7      removed


Here's the few I have from before.  I really shouldn't have been doing
this at 4am.
****************
mdadm --examine /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 127

    Update Time : Tue Oct  3 22:38:22 2017
          State : clean
 Active Devices : 4
Working Devices : 6
 Failed Devices : 3
  Spare Devices : 2
       Checksum : cdfbf074 - correct
         Events : 11559615

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     7       8       80        7      spare   /dev/sdf

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       0        0        4      faulty removed
   5     5       0        0        5      faulty removed
   6     6       0        0        6      faulty removed
   7     7       8       80        7      spare   /dev/sdf
   8     8       8       16        8      spare   /dev/sdb

mdadm --examine /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 127

    Update Time : Tue Oct  3 22:38:22 2017
          State : clean
 Active Devices : 4
Working Devices : 6
 Failed Devices : 3
  Spare Devices : 2
       Checksum : cdfbf023 - correct
         Events : 11559615

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8        1        3      active sync   /dev/sda1

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       0        0        4      faulty removed
   5     5       0        0        5      faulty removed
   6     6       0        0        6      faulty removed
   7     7       8       80        7      spare   /dev/sdf
   8     8       8       16        8      spare   /dev/sdb

Here's the 3 failed drives: NOTE: I only had one bay available, so
they all have the same drive letter

mdadm --examine /dev/sdz1
/dev/sdz1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 126

    Update Time : Mon Jul 11 16:54:15 2016
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : ca7ec3b0 - correct
         Events : 3397832

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1      65       97        1      active sync

   0     0      65      113        0      active sync
   1     1      65       97        1      active sync
   2     2      65       81        2      active sync
   3     3       0        0        3      faulty removed
   4     4      65       49        4      active sync
   5     5      65       33        5      active sync
   6     6      65       17        6      active sync

**********************THE ONE BELOW I'M DOING A DDRESCUE FROM******

mdadm --examine /dev/sdz1
/dev/sdz1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 127

    Update Time : Sat Sep  2 01:00:37 2017
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : cd217ebc - correct
         Events : 11559404

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       65        5      active sync   /dev/sde1

   0     0       8       81        0      active sync
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync
   3     3      65      129        3      active sync   /dev/sdy1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8       65        5      active sync   /dev/sde1
   6     6       0        0        6      faulty removed



***************

mdadm --examine /dev/sdz1
/dev/sdz1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 9767519360 (9315.03 GiB 10001.94 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 127

    Update Time : Mon Nov  7 02:02:38 2016
          State : active
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0
       Checksum : cb1ec57d - correct
         Events : 3652739

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       97        6      active sync   /dev/sdg1

   0     0       8       81        0      active sync
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync
   3     3      65      129        3      active sync   /dev/sdy1
   4     4       8       49        4      active sync   /dev/sdd1
   5     5       8       65        5      active sync   /dev/sde1
   6     6       8       97        6      active sync   /dev/sdg1

CURRENT EXAMINE
*************************
mdadm -E /dev/sd[acdeg]1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 11721023232 (11178.04 GiB 12002.33 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 127

  Reshape pos'n : 3799296 (3.62 GiB 3.89 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Oct  4 12:49:57 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ce71a9cb - correct
         Events : 11559681

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8        1        3      active sync   /dev/sda1

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       0        0        5      faulty removed
   6     6       8       16        6      active   /dev/sdb
   7     7       0        0        7      faulty removed
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 11721023232 (11178.04 GiB 12002.33 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 127

  Reshape pos'n : 3799296 (3.62 GiB 3.89 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Oct  4 12:49:57 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ce71a9e9 - correct
         Events : 11559681

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       0        0        5      faulty removed
   6     6       8       16        6      active   /dev/sdb
   7     7       0        0        7      faulty removed
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 11721023232 (11178.04 GiB 12002.33 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 127

  Reshape pos'n : 3799296 (3.62 GiB 3.89 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Oct  4 12:49:57 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ce71a9f7 - correct
         Events : 11559681

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       49        1      active sync   /dev/sdd1

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       0        0        5      faulty removed
   6     6       8       16        6      active   /dev/sdb
   7     7       0        0        7      faulty removed
/dev/sde1:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 11721023232 (11178.04 GiB 12002.33 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 127

  Reshape pos'n : 3799296 (3.62 GiB 3.89 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Oct  4 12:49:57 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ce71aa0d - correct
         Events : 11559681

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       65        4      active sync   /dev/sde1

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       0        0        5      faulty removed
   6     6       8       16        6      active   /dev/sdb
   7     7       0        0        7      faulty removed
/dev/sdg1:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 11721023232 (11178.04 GiB 12002.33 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 127

  Reshape pos'n : 3799296 (3.62 GiB 3.89 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Oct  4 12:49:57 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ce71aa25 - correct
         Events : 11559681

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       97        0      active sync   /dev/sdg1

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       0        0        5      faulty removed
   6     6       8       16        6      active   /dev/sdb
   7     7       0        0        7      faulty removed

/dev/sdb:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 714a612d:9bd35197:36c91ae3:c168144d
  Creation Time : Fri Jun 15 15:52:05 2012
     Raid Level : raid6
  Used Dev Size : 1953503872 (1863.01 GiB 2000.39 GB)
     Array Size : 11721023232 (11178.04 GiB 12002.33 GB)
   Raid Devices : 8
  Total Devices : 6
Preferred Minor : 127

  Reshape pos'n : 3799296 (3.62 GiB 3.89 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Oct  4 12:49:57 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 2
  Spare Devices : 0
       Checksum : ce71a9dc - correct
         Events : 11559681

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8       16        6      active   /dev/sdb

   0     0       8       97        0      active sync   /dev/sdg1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8        1        3      active sync   /dev/sda1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       0        0        5      faulty removed
   6     6       8       16        6      active   /dev/sdb
   7     7       0        0        7      faulty remove

> Post the uncut output inline here on the list, in plain text mode, with
> line wrap disabled, please.
>
>> If you've got enough to ddrescue all of those five original drives, then
>> that's absolutely great.
>>
>> Remember - if we can't get five original drives (or copies thereof) the
>> array is toast.
>>>
>>> lol, chalk one more up for FML. "SCT Error Recovery Control command
>>> not supported".  I'm guessing this is a real bad thing now?  I didn't
>>> buy these drives or org set it up.
>>>
>> I'm not sure whether this is good news or bad. Actually, it *could* be
>> great news for the rescue! It's bad news for raid though, if you don't
>> deal with it up front - I guess that wasn't done ...
>
> It is mixed news.  It is almost certainly the reason you've had drives
> bumped out of your arrays.  I suspect these drives all report *PASSED*
> from smartctl.  Which means that the drives really are good, just
> suffering from ordinary uncorrected errors.
>
> You'll certainly have to use the 180 second driver timeout work-around
> to get through this crisis.
>
> In the meantime, please run "smartctl -iA -l scterc" on each of your
> drives, including the failed ones, and post the uncut output here.
> { Include the device name with each }
>
Sorry I don't have it for the failed ones, I forgot to run in before I
started ddrescue, here's the current drives
# smartctl -iA -l scterc /dev/sda
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1ER164
Serial Number:    W4Z14ZNW
LU WWN Device Id: 5 000c50 07d29ef14
Firmware Version: CC25
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 20:28:30 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail
Always       -       11140560
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       14
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   089   060   030    Pre-fail
Always       -       827856598
  9 Power_On_Hours          0x0032   079   079   000    Old_age
Always       -       18858
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       14
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0 0 0
189 High_Fly_Writes         0x003a   013   013   000    Old_age
Always       -       87
190 Airflow_Temperature_Cel 0x0022   071   063   045    Old_age
Always       -       29 (Min/Max 29/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       8
193 Load_Cycle_Count        0x0032   100   100   000    Old_age
Always       -       268
194 Temperature_Celsius     0x0022   029   040   000    Old_age
Always       -       29 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       18841h+49m+16.895s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       84336821090
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       4832824497202

SCT Error Recovery Control command not supported

# smartctl -iA -l scterc /dev/sdb
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1ER164
Serial Number:    Z4Z3Y7XM
LU WWN Device Id: 5 000c50 087461756
Firmware Version: CC26
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 20:28:44 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail
Always       -       157161144
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       15
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   030    Pre-fail
Always       -       409701090
  9 Power_On_Hours          0x0032   086   086   000    Old_age
Always       -       12274
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       15
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0 0 0
189 High_Fly_Writes         0x003a   092   092   000    Old_age
Always       -       8
190 Airflow_Temperature_Cel 0x0022   070   065   045    Old_age
Always       -       30 (Min/Max 28/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       10
193 Load_Cycle_Count        0x0032   100   100   000    Old_age
Always       -       142
194 Temperature_Celsius     0x0022   030   040   000    Old_age
Always       -       30 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       12268h+22m+21.157s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       83831274067
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       124530518173

SCT Error Recovery Control command not supported

# smartctl -iA -l scterc /dev/sdc
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Black
Device Model:     WDC WD2002FAEX-007BA0
Serial Number:    WD-WMAY04949787
LU WWN Device Id: 5 0014ee 25c3e0682
Firmware Version: 05.01D05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 20:28:46 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       2
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail
Always       -       8041
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   071   071   000    Old_age
Always       -       21337
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       35
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age
Always       -       8
194 Temperature_Celsius     0x0022   117   107   000    Old_age
Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       0

SCT Error Recovery Control command not supported

# smartctl -iA -l scterc /dev/sdd
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Black
Device Model:     WDC WD2002FAEX-007BA0
Serial Number:    WD-WMAY04912439
LU WWN Device Id: 5 0014ee 25c3f0960
Firmware Version: 05.01D05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 20:29:33 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       8
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail
Always       -       7950
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   071   071   000    Old_age
Always       -       21325
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       35
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age
Always       -       8
194 Temperature_Celsius     0x0022   116   106   000    Old_age
Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       0

SCT Error Recovery Control command not supported

# smartctl -iA -l scterc /dev/sde
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Black
Device Model:     WDC WD2002FAEX-007BA0
Serial Number:    WD-WMAY05040774
LU WWN Device Id: 5 0014ee 2b1938a22
Firmware Version: 05.01D05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 20:29:36 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       0
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail
Always       -       8083
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       36
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   071   071   000    Old_age
Always       -       21328
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       35
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age
Always       -       8
194 Temperature_Celsius     0x0022   116   108   000    Old_age
Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       2
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       2

SCT Error Recovery Control command not supported

#smartctl -iA -l scterc /dev/sdg
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-229.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1ER164
Serial Number:    ZA5029A8
LU WWN Device Id: 5 000c50 0874eb397
Firmware Version: CC26
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Oct  4 20:29:42 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail
Always       -       232755000
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       10
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail
Always       -       606406779
  9 Power_On_Hours          0x0032   086   086   000    Old_age
Always       -       13052
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       10
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age
Always       -       0 0 0
189 High_Fly_Writes         0x003a   098   098   000    Old_age
Always       -       2
190 Airflow_Temperature_Cel 0x0022   069   060   045    Old_age
Always       -       31 (Min/Max 29/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
Always       -       4
193 Load_Cycle_Count        0x0032   100   100   000    Old_age
Always       -       310
194 Temperature_Celsius     0x0022   031   040   000    Old_age
Always       -       31 (0 25 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       13039h+51m+52.726s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       33464557056
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       2634696436762

SCT Error Recovery Control command not supported


>> Go and read the wiki - the "When Things Go Wrogn" section. That will
>> hopefully help a lot and it explains the Error Recovery stuff (the
>> timeout mismatch page). Fix that problem and your dodgy drives will
>> probably dd without trouble at all.
>
> Let me emphasize this.  The timeout mismatch problem is so prevalent and
> your experience so common that I thought to myself "I bet this one is
> timeout mismatch" when I read your first mail.
>
>> Hopefully with all copied drives, but if you have to mix dd'd and
>> original drives you're probably okay, you should now be able to assemble
>> a working array with five drives by doing an
>
> As already noted, you definitely need to use ddrescue on the third
> drive that failed.  You may also need to ddrescue your four remaining
> good drives if they also have "Pending Sector" counts.
>
>> mdadm --assemble blah blah blah --update=revert-reshape
>>
>> That will put you back to a "5 drives out of 7" working array. The
>> problem with this is that it will be a degraded, linear array.
>
> This is the correct next step, after all required ddrescues.
>
>> I'm not sure whether a --display will list the failed drives - if it
>> does you can now --remove them. So you'll now have a working, 7-drive
>> array, with two drives missing.
>
> This is the time to grab any backups you need of critical content.  Do
> *not* write to the array at this point.  Get all your data off.
>
> Then:
>
>> Now --add in the two new drives. MAKE SURE you've read the section on
>> timeout mismatch and dealt with it! The rebuild/recovery will ALMOST
>> CERTAINLY FAIL if you don't! Also note that I am not sure about how
>> those drives will display while rebuilding - they may well display as
>> being spares during a rebuild.
>
> The timeout mismatch fixes won't help your case.  You have no redundancy
> left, so the kickout scenarios involved no longer apply.  They applied
> when your first two drives were kicked out.  When timeouts are not
> mismatched, MD raid *fixes* the occasional bad sector.
>
>> Lastly, MAKE SURE you set up a regular scrub. There's a distinct
>> possibility that this problem wouldn't have arisen (or would have been
>> found quicker) if a scrub had been in place. And if you can set up a
>> trigger that emails you the contents of /proc/mdstat every few days.
>> It's far too easy to miss a failed drive if you don't have something
>> shoving it in your face every few days.
>
> If you have a timeout mismatch problem, one's array will die much sooner
> with scrubs.  Because MD raid will fail to fix UREs, and it will find
> them right away.
>
> But again, get us the detailed reports, and we'll help make sure your
> commands are correct.
>
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux