Re: mdadm reshaping stuck problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Phil,

Thanks for your fast reply.

I did run your commands and the results are attached to this email and on pastebin here:

https://pastebin.com/EVpLfmAe
https://pastebin.com/ZMBYB5CW


The drive names have changed because I deinstalled one drive that was not in the raid. I had a copy of all data on this drive so I'm trying to recover my data with that drive now. The chances are good because I did overwrite the partition table only.



Am 03.12.2017 15:17 schrieb Phil Turmel:
Hi Rene,

On 12/03/2017 07:47 AM, rene.feistle@xxxxxxxxx wrote:
Hello,

after hours and hours of googling and trying out things, I gave up on
this. This email is my last hope of getting my data back.

I'm worried for you -- "trying out things" can be dangerous.

I have 4*4TB drives installed and want to create a raid 5 with them.

So what I did is create an array of 3 disks (raid 5), copy the data from the 4th drive (I don't have more space available) to the raid and then I
wanted to add the last drive to the raid.

Ok.

I made a mistake here. I accidentally grew the raid to 4 disks with

sudo mdadm --grow --raid-devices=4 /dev/md0 --backup-file=/tmp/md0.bak

BEFORE adding the last drive as a hot spare. Mdadm immediately started a
reshape and says that it failed - because it consists of 4 drives but
only 3 drives are available.

Adding the fourth drive at this point should have enabled the reshape to
resume.

I thought okay, let him complete the reshape and everything will be
okay. But no - the reshape is stuck at 34.3%.

What I have tried:

- Reboot ( about a 100 times)
- increase stripe cache size up to 32768

mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak
/dev/md0 /dev/sdc1 /dev/sde1 /dev/sdf1

And some other things.

We will probably need you to detail "some other things".

The raid is not mountable. When I try to mount it, the mount command
just hungs and nothing happens. That means that I had to edit my fstab
with a rescue cd because it would never boot again.
That also means that I have no access to my data.

When I shutdown or reboot the computer, it also hungs at shutdown, I can
only hard reset it.

cat /proc/mdstat:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [r$
md0 : active raid5 sdc1[0] sdf1[3] sde1[1]
      7813771264 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UU__]       [======>..............]  reshape = 34.3% (1340465664/3906885632) finish=3$
      bitmap: 3/30 pages [12KB], 65536KB chunk

unused devices: <none>

Note the "UU__". That means as some point your three-drive array lost a
drive, and the reshape is showing another missing drive.  A
doubly-degraded array cannot run.

mdadm --detail /dev/md0


/dev/md0:
        Version : 1.2
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
     Array Size : 7813771264 (7451.79 GiB 8001.30 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Dec  3 13:34:43 2017
          State : active, FAILED, reshaping
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Reshape Status : 34% complete
  Delta Devices : 1, (3->4)

           Name : nas-server:0  (local to host nas-server)
           UUID : e410e68d:76460b65:69c056c0:d2645d55
         Events : 28155

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       65        1      active sync   /dev/sde1
       3       8       81        2      spare rebuilding   /dev/sdf1
       6       0        0        6      removed

Note the "spare rebuilding" on sdf1.  That means at some point sdf1 was
ejected from your array and you --added it back.  A supposition
buttressed by its slot number displayed in mdstat.  sdf1 was already a
critical device, so --add destroyed important data on it.

Any help is appreciated, I'm lost.

With the current status of the array, doubly-degraded with a reshape
quite far along, I am not optimistic for you.  However, you have not
provided all the information that might be helpful here.  Please supply
the output (cat'd to a file, not copied from a narrow terminal, please)
of these commands:

for x in /dev/sd[cef]1 ; do echo $x ; mdadm -E $x ; done

for x in /dev/sd[cef] ; do echo $x ; smartctl -iA -l scterc $x ; done

Please make sure your mailer is in plain text mode with line wrap
disabled to ensure the content isn't corrupted when you paste it into
your reply.

Regards,

Phil
/dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : e410e68d:76460b65:69c056c0:d2645d55
           Name : nas-server:0  (local to host nas-server)
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 11720656896 (11177.69 GiB 12001.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : f4490b9f:475aca83:2ff93d65:a4fabf8c

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 4021168128 (3834.88 GiB 4117.68 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun Dec  3 13:34:43 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : d1b15eed - correct
         Events : 28155

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : e410e68d:76460b65:69c056c0:d2645d55
           Name : nas-server:0  (local to host nas-server)
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 11720656896 (11177.69 GiB 12001.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : eb679765:5771cc1a:651f5a86:e166424b

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 4021168128 (3834.88 GiB 4117.68 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun Dec  3 13:34:43 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ede2668 - correct
         Events : 28155

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x7
     Array UUID : e410e68d:76460b65:69c056c0:d2645d55
           Name : nas-server:0  (local to host nas-server)
  Creation Time : Fri Dec  1 02:10:06 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813771264 (3725.90 GiB 4000.65 GB)
     Array Size : 11720656896 (11177.69 GiB 12001.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
Recovery Offset : 2680909424 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 2c5c510d:3ddd9cb3:85782829:cdce1b89

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 4021168128 (3834.88 GiB 4117.68 GB)
  Delta Devices : 1 (3->4)

    Update Time : Sun Dec  3 13:34:43 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : cfdbb60f - correct
         Events : 28155

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-40-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN008-2DR166
Serial Number:    ZDH2M2PV
LU WWN Device Id: 5 000c50 0a5690986
Firmware Version: SC60
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec  3 15:48:15 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   073   064   044    Pre-fail  Always       -       19890494
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   073   060   045    Pre-fail  Always       -       21251399
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       134 (224 223 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   054   040    Old_age   Always       -       33 (Min/Max 33/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       53
194 Temperature_Celsius     0x0022   033   046   000    Old_age   Always       -       33 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       132 (215 163 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       28126761649
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       13670857188

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

/dev/sdd
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-40-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN008-2DR166
Serial Number:    ZDH2GRNF
LU WWN Device Id: 5 000c50 0a556eef6
Firmware Version: SC60
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec  3 15:48:15 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   079   066   044    Pre-fail  Always       -       83554824
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   045    Pre-fail  Always       -       16531420
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       134 (30 205 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       8
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   067   058   040    Old_age   Always       -       33 (Min/Max 33/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       79
194 Temperature_Celsius     0x0022   033   042   000    Old_age   Always       -       33 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       132 (251 231 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       16036671217
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       13248987983

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

/dev/sde
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-40-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    Z3018XTT
LU WWN Device Id: 5 000c50 065b12345
Firmware Version: CC54
User Capacity:    4.000.787.030.016 bytes [4,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec  3 15:48:15 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   099   006    Pre-fail  Always       -       56920152
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2179
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   066   060   030    Pre-fail  Always       -       4457875
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18477
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       136
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   051   045    Old_age   Always       -       34 (Min/Max 33/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   085   085   000    Old_age   Always       -       31842
194 Temperature_Celsius     0x0022   034   049   000    Old_age   Always       -       34 (0 10 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5936h+59m+15.445s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       34663048643
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       14331594669

SCT Error Recovery Control command not supported


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux