Re: request for help on IMSM-metadata RAID-5 array

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Mon, 25 Sep 2023 11:44:20 +0200

On Sat, 23 Sep 2023 12:54:52 +0200
Joel Parthemore <joel@xxxxxxxxxxxxxxx> wrote:

> Apologies in advance for the long email, but I wanted to include 
> everything that is asked for on the "asking for help" page associated 
> with the mailing list. The output from some of the requested commands is 
> pretty lengthy.
> 
> My home directory is on a three-disk RAID-5 array that, for whatever 
> reason (it seemed like a good idea at the time?), I built using the 
> hooks from the UEFI BIOS (or so I understand what I did). That is to 
> say, it's a "real" software-based RAID array in Linux that's built on a 
> "fake" RAID array in the UEFI BIOS. Mostly nothing important is stored 
> on the /home partition, but I forgot to back up a few important things 
> that are (or, at least, were). So I'd like to get the RAID array back if 
> I can, or know if I can't; and I will be extremely grateful to anyone 
> who can tell me one way or the other.
> 
> All was well for some number of years until a few days ago. After I 
> installed the latest KDE updates, the RAID array would lock up entirely 
> when I tried to log in to a new KDE Wayland session. It all came down to 
> one process that refused to die, running startplasma-wayland. Because 
> the process refused to die, the RAID array could not be stopped cleanly 
> and rebooting the computer therefore caused the RAID array to go out of 
> sync. After that, any attempt whatsoever to access the RAID array would 
> cause the RAID array to lock up again.
> 
> The first few times this happened, I was able to start the computer 
> without starting the RAID array, reassemble the RAID array using the 
> command mdadm --assemble --run --force /dev/md126 /dev/sda /dev/sde 
> /dev/sdc and have it working fine -- I could fix any filestore problems 
> with e2fsck, mount /home, log in to my home directory, do pretty much 
> whatever I wanted -- until I tried logging into a new KDE Wayland 
> session again. This happened several times while I was trying to 
> troubleshoot the problem with startplasma-wayland.
> 
> Unfortunately, one time this didn't work. I was still able to start the 
> computer without starting the RAID array, reassemble it and reboot with 
> the RAID array looking seemingly okay (according to mdadm -D) BUT this 
> time, any attempt to access the RAID array or even just stop the array 
> (mdadm --stop /dev/md126, mdadm --stop /dev/md127) once it was started 
> would cause the RAID array to lock up. That means (I think) that I can't 
> create an image of the array contents using dd, which is what -- of 
> course -- I should have done in the first place. (I could assemble the 
> RAID array read-only, but the RAID array is out of sync because it 
> didn't shut down properly.)
> 
> I'm guessing that the contents of the filestore on the RAID array are 
> probably still there. Does anyone have suggestions on getting the RAID 
> array working properly again and accessing them? I have avoided doing 
> anything further myself because, of course, if the contents of the 
> filestore are still there, I don't want to do anything to jeopardize 
> them. You may tell me that I've done too much already. :-)

Hi Joel,
sorry for late response, I see that you were able to recover the data!
I was few days off.

I think that metadata manager is down or broken from some reasons.
#systemctl status mdmon@md127.service

I you will get the problem again, please try (but do not abuse- use it as last
resort!!):
#systemctl restart mdmon@md127.service

We know that there was a change in systemd and it causes that our userspace
metadata manager was not responsible because it couldn't be restarted after
switch root. Issue is fixed in upstream:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=723d1df4946eb40337bf494f9b2549500c1399b2

I didn't read whole thread but issue matches for me.
Hopefully, you will find it useful.

Thanks,
Mariusz