3 drive RAID5 with 1 bad drive, 1 drive active but not clean and a single clean drive

importantdata <importantdata@xxxxxxxxxxxxxx> · Wed, 03 Feb 2021 04:04:39 +0000

Hello,

So, I have a bad situation.  I run a raid5 array with 3 drives, I noticed one had fallen out of the array.  I need to setup better monitoring, it turns out this was quite some time ago (back in Nov!).  This leaves two drives and puts me into a bit of a scare.  So I decide to move the important data off to a different array.  I created a dir in /mnt made a new LVM partition formatted it as EXT4 and started syncing stuff over...  Kinda forgot to actually mount it though so all that data was syncing right back into the bad array (woops!).  Server died this morning, I assume the extra stress may have done something with the drives, or perhaps it filled up root and panicked.  In either case I could not boot.  Setup a fresh OS on my other raid array and got some tools installed and now I am working on trying to assemble the bad raid array enough to pull out my data.  My data is contained within an LVM within the raid array.

I have attached the examine and mdadm of the drives, as you can see ddd has a really old update time.  This drive was having lots of i/o errors.  So I want to use sde and sdf to assemble a read-only array, assemble the LVM, mount then copy my important data off.  They are 185 events different so I assume there will be some slight data corruption.  But I am hoping its mostly fine and likely part of my bad rsync.

So unfortunately I don't know what mdadm version was used to make this array or the OS version as that's all stuck on the dead array.  Here is what I am running on my new install I am using to try and recover the data:
Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
mdadm - v4.1 - 2018-10-01

The initial -D looked like this:

```
root@kglhost-1:~# mdadm -D /dev/md1
/dev/md1:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 3
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 3

              Name : blah:1
              UUID : fba7c062:e352fa39:fdc09bf9:e21c4617
            Events : 18094545

    Number   Major   Minor   RaidDevice

       -       8       82        -        /dev/sdf2
       -       8       66        -        /dev/sde2
       -       8       50        -        /dev/sdd2
```

I tried to run the array but that failed:

```
# mdadm -o -R  /dev/md1
mdadm: failed to start array /dev/md/1: Input/output error
```

In dmesg it says

```
[Tue Feb  2 21:14:42 2021] md: kicking non-fresh sdd2 from array!
[Tue Feb  2 21:14:42 2021] md: kicking non-fresh sdf2 from array!
[Tue Feb  2 21:14:42 2021] md/raid:md1: device sde2 operational as raid disk 1
[Tue Feb  2 21:14:42 2021] md/raid:md1: not enough operational devices (2/3 failed)
[Tue Feb  2 21:14:42 2021] md/raid:md1: failed to run raid set.
[Tue Feb  2 21:14:42 2021] md: pers->run() failed ...
```

That made the array look like so:

```
# mdadm -D /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Thu Jul 30 21:34:20 2015
        Raid Level : raid5
     Used Dev Size : 489615360 (466.93 GiB 501.37 GB)
      Raid Devices : 3
     Total Devices : 1
       Persistence : Superblock is persistent

       Update Time : Tue Feb  2 13:55:02 2021
             State : active, FAILED, Not Started
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : unknown

              Name : blah:1
              UUID : fba7c062:e352fa39:fdc09bf9:e21c4617
            Events : 18094730

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed

       -       8       66        1      sync   /dev/sde2
```

I was hoping that assume-clean might be helpful, but seems I can't assemble with that option

```
# mdadm --assemble --assume-clean -o  /dev/md1 /dev/sde /dev/sdf
mdadm: :option --assume-clean not valid in assemble mode
```

So I tried a more normal assemble but it does not have enough drives to start the array:

```
# mdadm --assemble -o  /dev/md1 /dev/sde2 /dev/sdf2
mdadm: /dev/md1 assembled from 1 drive - not enough to start the array.
```

mdstat looked like this awhile ago:

```
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdf2[2](S) sde2[1](S)
      979230720 blocks super 1.2

md2 : active raid5 sdb1[1] sda1[0] sdc1[2]
      3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 1/15 pages [4KB], 65536KB chunk

md0 : active raid1 sde1[4]
      242624 blocks super 1.0 [3/1] [_U_]

unused devices: <none>
```

Now it looks like so
```
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : inactive sdf2[2](S) sde2[1](S)
      979230720 blocks super 1.2

md2 : active raid5 sdb1[1] sda1[0] sdc1[2]
      3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

md0 : active raid1 sde1[4]
      242624 blocks super 1.0 [3/1] [_U_]

unused devices: <none>
```

I am really concerned about --force...  and https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn does nothing to alleviate those fears.

Anyone have suggestions on what to do next?
Thanks!
Attachment:
examine_initial

Description: Binary data
Attachment:
mdadm_all_sde

Description: Binary data
Attachment:
mdadm_all_sdf

Description: Binary data
Attachment:
mdadm_all_sdd

Description: Binary data