Hello, So, I have a bad situation. I run a raid5 array with 3 drives, I noticed one had fallen out of the array. I need to setup better monitoring, it turns out this was quite some time ago (back in Nov!). This leaves two drives and puts me into a bit of a scare. So I decide to move the important data off to a different array. I created a dir in /mnt made a new LVM partition formatted it as EXT4 and started syncing stuff over... Kinda forgot to actually mount it though so all that data was syncing right back into the bad array (woops!). Server died this morning, I assume the extra stress may have done something with the drives, or perhaps it filled up root and panicked. In either case I could not boot. Setup a fresh OS on my other raid array and got some tools installed and now I am working on trying to assemble the bad raid array enough to pull out my data. My data is contained within an LVM within the raid array. I have attached the examine and mdadm of the drives, as you can see ddd has a really old update time. This drive was having lots of i/o errors. So I want to use sde and sdf to assemble a read-only array, assemble the LVM, mount then copy my important data off. They are 185 events different so I assume there will be some slight data corruption. But I am hoping its mostly fine and likely part of my bad rsync. So unfortunately I don't know what mdadm version was used to make this array or the OS version as that's all stuck on the dead array. Here is what I am running on my new install I am using to try and recover the data: Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux mdadm - v4.1 - 2018-10-01 The initial -D looked like this: ``` root@kglhost-1:~# mdadm -D /dev/md1 /dev/md1: Version : 1.2 Raid Level : raid0 Total Devices : 3 Persistence : Superblock is persistent State : inactive Working Devices : 3 Name : blah:1 UUID : fba7c062:e352fa39:fdc09bf9:e21c4617 Events : 18094545 Number Major Minor RaidDevice - 8 82 - /dev/sdf2 - 8 66 - /dev/sde2 - 8 50 - /dev/sdd2 ``` I tried to run the array but that failed: ``` # mdadm -o -R /dev/md1 mdadm: failed to start array /dev/md/1: Input/output error ``` In dmesg it says ``` [Tue Feb 2 21:14:42 2021] md: kicking non-fresh sdd2 from array! [Tue Feb 2 21:14:42 2021] md: kicking non-fresh sdf2 from array! [Tue Feb 2 21:14:42 2021] md/raid:md1: device sde2 operational as raid disk 1 [Tue Feb 2 21:14:42 2021] md/raid:md1: not enough operational devices (2/3 failed) [Tue Feb 2 21:14:42 2021] md/raid:md1: failed to run raid set. [Tue Feb 2 21:14:42 2021] md: pers->run() failed ... ``` That made the array look like so: ``` # mdadm -D /dev/md1 /dev/md1: Version : 1.2 Creation Time : Thu Jul 30 21:34:20 2015 Raid Level : raid5 Used Dev Size : 489615360 (466.93 GiB 501.37 GB) Raid Devices : 3 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Feb 2 13:55:02 2021 State : active, FAILED, Not Started Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Consistency Policy : unknown Name : blah:1 UUID : fba7c062:e352fa39:fdc09bf9:e21c4617 Events : 18094730 Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 8 66 1 sync /dev/sde2 ``` I was hoping that assume-clean might be helpful, but seems I can't assemble with that option ``` # mdadm --assemble --assume-clean -o /dev/md1 /dev/sde /dev/sdf mdadm: :option --assume-clean not valid in assemble mode ``` So I tried a more normal assemble but it does not have enough drives to start the array: ``` # mdadm --assemble -o /dev/md1 /dev/sde2 /dev/sdf2 mdadm: /dev/md1 assembled from 1 drive - not enough to start the array. ``` mdstat looked like this awhile ago: ``` Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : inactive sdf2[2](S) sde2[1](S) 979230720 blocks super 1.2 md2 : active raid5 sdb1[1] sda1[0] sdc1[2] 3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] bitmap: 1/15 pages [4KB], 65536KB chunk md0 : active raid1 sde1[4] 242624 blocks super 1.0 [3/1] [_U_] unused devices: <none> ``` Now it looks like so ``` Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : inactive sdf2[2](S) sde2[1](S) 979230720 blocks super 1.2 md2 : active raid5 sdb1[1] sda1[0] sdc1[2] 3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] bitmap: 0/15 pages [0KB], 65536KB chunk md0 : active raid1 sde1[4] 242624 blocks super 1.0 [3/1] [_U_] unused devices: <none> ``` I am really concerned about --force... and https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn does nothing to alleviate those fears. Anyone have suggestions on what to do next? Thanks!
Attachment:
examine_initial
Description: Binary data
Attachment:
mdadm_all_sde
Description: Binary data
Attachment:
mdadm_all_sdf
Description: Binary data
Attachment:
mdadm_all_sdd
Description: Binary data