Hi everybody, I need some help to retrieve access to a very Big RAID
volume, even if remote backups are existing for the most important part
of the data into it, RAID volume is completely locked but nothing is
destroyed, so I guess you can help me.
No real change has been written to the disks : the reshape (from 3 to 4
disk) didn't actually started (it was stuck at 0%). No disk is defective
and I guess nothing wrong or stupid has been attempted. And no power
failure.
I saved information that were available into dmesg (some lines below). I
also used --backup-file=/root/grow_md0.bak but it does not seem to
contain anything useful (3 149 824 null bytes), or I don't know.
I have a RAID5 array, which was made of 3 disks : /dev/sdb1, /dev/sdd1
and /dev/sde1
Each one is 8TB
sdc1 is a new disk (I created a GPT partition table and an empty
partition, like I always did before placing a disk into a RAID). Then I
played :
* mdadm --add /dev/md0 /dev/sdc1
* mdadm --detail /dev/md0 (fine, the new disk was shown as spare)
* mdadm --grow --raid-devices=4 --backup-file=/root/grow_md0.bak /dev/md0
When playing mdadm --detail /dev/md0, it seemed to be fine, and was
showing :
State : clean, reshaping
Reshape Status : 0% complete
But there was no activity on any disk (just few bytes read when I was
reading some file), even after 10 minutes (according to bwm-ng and the
HDD led). This question is exactly what happened to me :
https://serverfault.com/questions/814025/mdadm-reshape-raid6-does-not-start
So I stopped services that were using the mount point, played umount
/media/RAID-VOLUME.
Still no activity : I played mdadm --stop /dev/md0 and restarted the
computer
I can't manage to mount the array anymore, with or without backup file,
with or without the /dev/sdb1 new disk :
mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
mdadm: Failed to restore critical section for reshape, sorry.
Here is attached the commands and results, and just below, some things
that are may be useful to re-enable access to the data.
Can someone help me into the recovery of this situation ?
If something risky has to be attempted, I can remove the new disk,
keeping only 3 drive in place. Then, I would have 2 spare drives (out of
3) that should be enough to backup at least 2 disks before attempting
anything risky... Using a 3 disk RAID5, having 2 disk in perfect
condition should be fine to recover anything, isn't it ?
Thank you in advance !
Julien
Apr 30 02:48:37 Pix-Server-Sorel kernel: [47933.831101] md: bind<sdc1>
Apr 30 02:48:37 Pix-Server-Sorel kernel: [47934.106022] RAID conf printout:
Apr 30 02:48:37 Pix-Server-Sorel kernel: [47934.106025] --- level:5
rd:3 wd:3
Apr 30 02:48:37 Pix-Server-Sorel kernel: [47934.106028] disk 0, o:1,
dev:sdd1
Apr 30 02:48:37 Pix-Server-Sorel kernel: [47934.106029] disk 1, o:1,
dev:sde1
Apr 30 02:48:37 Pix-Server-Sorel kernel: [47934.106031] disk 2, o:1,
dev:sdb1
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904011] RAID conf printout:
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904014] --- level:5
rd:4 wd:4
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904016] disk 0, o:1,
dev:sdd1
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904017] disk 1, o:1,
dev:sde1
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904018] disk 2, o:1,
dev:sdb1
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904019] disk 3, o:1,
dev:sdc1
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904087] md: reshape of
RAID array md0
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904090] md: minimum
_guaranteed_ speed: 1000 KB/sec/disk.
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904092] md: using
maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for reshape.
Apr 30 02:49:26 Pix-Server-Sorel kernel: [47982.904096] md: using 128k
window, over a total of 7813894144k.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48773.766672] md: md0: reshape
interrupted.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48773.827995] md: reshape of
RAID array md0
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48773.827997] md: minimum
_guaranteed_ speed: 1000 KB/sec/disk.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48773.827999] md: using
maximum available idle IO bandwidth (but not more than 200000 KB/sec)
for reshape.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48773.828021] md: using 128k
window, over a total of 7813894144k.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.027993] md: md0: reshape
interrupted.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.112612] md0: detected
capacity change from 16002855206912 to 0
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.112850] md: md0 stopped.
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.112860] md: unbind<sdc1>
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.132027] md:
export_rdev(sdc1)
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.132073] md: unbind<sdb1>
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.148016] md:
export_rdev(sdb1)
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.148261] md: unbind<sde1>
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.164018] md:
export_rdev(sde1)
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.164268] md: unbind<sdd1>
Apr 30 03:02:37 Pix-Server-Sorel kernel: [48774.200025] md:
export_rdev(sdd1)
root@Pix-Server-Sorel:/home/user# mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
root@Pix-Server-Sorel:/home/user# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@Pix-Server-Sorel:/home/user# mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdd1 /dev/sde1
mdadm: Failed to restore critical section for reshape, sorry.
Possibly you needed to specify the --backup-file
root@Pix-Server-Sorel:/home/user# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@Pix-Server-Sorel:/home/user# mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 --backup-file /root/grow_md0.bak
mdadm: Failed to restore critical section for reshape, sorry.
root@Pix-Server-Sorel:/home/user# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@Pix-Server-Sorel:/home/user# mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdd1 /dev/sde1 --backup-file /root/grow_md0.bak
mdadm: Failed to restore critical section for reshape, sorry.
root@Pix-Server-Sorel:/home/user# mdadm --examine --scan --verbose
ARRAY /dev/md/0 level=raid5 metadata=1.2 num-devices=4 UUID=293c6b6c:de6abd61:0a546f46:9996ba16 name=Pix-Server-Sorel:0
devices=/dev/sdc1,/dev/sde1,/dev/sdb1,/dev/sdd1
root@Pix-Server-Sorel:/home/user# mdadm --examine /dev/md0
root@Pix-Server-Sorel:/home/user# mdadm --examine /dev/sdb
/dev/sdb:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
root@Pix-Server-Sorel:/home/user# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 293c6b6c:de6abd61:0a546f46:9996ba16
Name : Pix-Server-Sorel:0 (local to host Pix-Server-Sorel)
Creation Time : Sat Mar 17 22:18:02 2018
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 15627788288 (7451.91 GiB 8001.43 GB)
Array Size : 23441682432 (22355.73 GiB 24004.28 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 656151c6:a45bd737:d6099641:520ed472
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 0
Delta Devices : 1 (3->4)
Update Time : Tue Apr 30 03:02:37 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : e8cc435a - correct
Events : 80978
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@Pix-Server-Sorel:/home/user# mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 293c6b6c:de6abd61:0a546f46:9996ba16
Name : Pix-Server-Sorel:0 (local to host Pix-Server-Sorel)
Creation Time : Sat Mar 17 22:18:02 2018
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 15627788288 (7451.91 GiB 8001.43 GB)
Array Size : 23441682432 (22355.73 GiB 24004.28 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 1ba9976c:25477f1b:4d8f0f64:5780a217
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 0
Delta Devices : 1 (3->4)
Update Time : Tue Apr 30 03:02:37 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 3a026e0e - correct
Events : 80978
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@Pix-Server-Sorel:/home/user# mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 293c6b6c:de6abd61:0a546f46:9996ba16
Name : Pix-Server-Sorel:0 (local to host Pix-Server-Sorel)
Creation Time : Sat Mar 17 22:18:02 2018
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 15627788288 (7451.91 GiB 8001.43 GB)
Array Size : 23441682432 (22355.73 GiB 24004.28 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 5b2f6332:ade8d470:2a6687eb:4386a7a6
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 0
Delta Devices : 1 (3->4)
Update Time : Tue Apr 30 03:02:37 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 6ba0729c - correct
Events : 80978
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@Pix-Server-Sorel:/home/user# mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 293c6b6c:de6abd61:0a546f46:9996ba16
Name : Pix-Server-Sorel:0 (local to host Pix-Server-Sorel)
Creation Time : Sat Mar 17 22:18:02 2018
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 15627788288 (7451.91 GiB 8001.43 GB)
Array Size : 23441682432 (22355.73 GiB 24004.28 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 8ca89464:e4353dea:bd1a45f4:8cc7b9a5
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 0
Delta Devices : 1 (3->4)
Update Time : Tue Apr 30 03:02:37 2019
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 1f0a2ee3 - correct
Events : 80978
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@Pix-Server-Sorel:/home/user# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sde1[2](S) sdd1[0](S) sdb1[3](S)
23441682432 blocks super 1.2
unused devices: <none>