Re: "md/raid:mdX: cannot start dirty degraded array."

Heinz Mauelshagen <heinzm@xxxxxxxxxx> · Thu, 18 Nov 2021 22:01:34 +0100

Andreas,
LVM RAID and MD RAID have different, hence incompatible superblock formats so you can't switch to MD in this case.

Try activating your RaidLV with 'lvchange -ay --activationmode=degraded /dev/vg_ssds_0/host-home', add a replacement PV of adequate size if none available and run 'lvconvert --repair /dev/vg_ssds_0/host-home'.

Best,
Heinz

On Fri, Oct 29, 2021 at 1:07 PM Andreas Trottmann <andreas.trottmann@xxxxxxxxxxx> wrote:
Am 11.10.21 um 16:08 schrieb Andreas Trottmann:

> I am running a server that runs a number of virtual machines and manages their virtual disks as logical volumes using lvmraid (...)

> After a restart, all of the logical volumes came back, except one.

> When I'm trying to activate it, I get:

>

> # lvchange -a y /dev/vg_ssds_0/host-home

>    Couldn't find device with uuid 8iz0p5-vh1c-kaxK-cTRC-1ryd-eQd1-wX1Yq9.

>    device-mapper: reload ioctl on  (253:245) failed: Input/output error

I am replying to my own e-mail here in order to document how I got the 

data back, in case someone in a similar situation finds this mail when 

searching for the symptoms.

First: I did *not* succeeed in activating the lvmraid volume. No matter 

how I tried to modify the _rmeta volumes, I always got "reload ioctl 

(...) failed: Input/output error" from "lvchange", and "cannot start 

dirty degraded array" in dmesg.

So, I used "lvchange -a y /dev/vg_ssds_0/host-home_rimage_0" (and 

_rimage_2 and _rimage_3, as those were the ones that were *not* on the 

failed PV) to get access to the indivdual RAID SubLVs. I then used "dd 

if=/dev/vg_ssds_0/host-home_rimage_0 of=/mnt/space/rimage_0" to copy the 

data to a file on a filesystem with enough space. I repeated this with 2 

and 3 as well. I then used losetup to access /mnt/space/rimage_0 as 

/dev/loop0, rimage_2 as loop2, and rimage_3 as loop3.

Now I wanted to use mdadm to "build" the RAID in the "array that doesn't 

have per-device metadata (superblocks)" case:

# mdadm --build /dev/md0 -n 4 -c 128 -l 5 --assume-clean --readonly 

/dev/loop0 missing /dev/loop2 /dev/loop3

However, this failed with "mdadm: Raid level 5 not permitted with --build".

("-c 128" was the chunk size used when creating the lvmraid, "-n 4" and 

"-l 5" refer to the number of devices and the raid level)

I then read the man page about the "superblocks", and found out that the 

"1.0" style of RAID metadata (selected with an mdadm "-e 1.0" option) 

places a superblock at the end of the device. Some experimenting on 

unused devices showed that the size used for actual data was the size of 

the block device minus 144 KiB (possibly 144 KiB = 128 KiB (chunksize) + 

8 KiB (size of superblock) + 8 KiB (size of bitmap). So I added 147456 

zero bytes at the end of each file:

# for i in 0 2 3; do head -c 147456 /dev/zero >> /mnt/space/rimage_$i; done

After detaching and re-attaching the loop devices, I ran

# mdadm --create /dev/md0 -n 4 -c 128 -l 5 -e 1.0 --assume-clean 

/dev/loop0 missing /dev/loop2 /dev/loop3

(substituting "missing" in the place where the missing RAID SubLV would 

have been)

And, voilà: /dev/md0 was perfectly readable, fsck showed no errors, and 

it could be mounted correctly, with all data intact.

Kind regards

-- 

Andreas Trottmann

Werft22 AG

Tel    +41 (0)56 210 91 32

Fax    +41 (0)56 210 91 34

Mobile +41 (0)79 229 88 55

_______________________________________________

linux-lvm mailing list

linux-lvm@xxxxxxxxxx

https://listman.redhat.com/mailman/listinfo/linux-lvm

read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/