Re: RAID6 12 device assemble force failure

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Tue, 2 Jul 2024 10:47:15 +0200

On Mon, 1 Jul 2024 11:33:16 +0200
Adam Niescierowicz <adam.niescierowicz@xxxxxxxxxx> wrote:

> Is there a way to force state=active in the metadata?
>  From what I saw each drive have exactly the same Events: 48640 and 
> Update Time so data on the drive should be the same.

The most important: I advice you to clone disks to have a safe space for
practicing. Whatever you will do is risky now, we don't want to make
situation worse. My suggestions might be destructible and I don't want to take
responsibility of making it worse.

We have --dump and --restore functionality, I've never used it
(I mainly IMSM focused) so I can just point you that it is there and it is an
option to clone metadata.

native metadata keep both spares and data in the same array, and we can see
that spare states for those 3 devices are consistently reported on every drive.

It means that at some point metadata with missing disk states updated to
spares  has been written to the all array members (including spares) but it does
not mean that the data is consistent. You are recovering from error scenario and
whatever is there, you need to be read for the worst case.

The brute-force method would be to recreate an array with same startup
parameters and --assume-clean flag but this is risky. Probably your array
was initially created few years (and mdadm versions) so there could be small
differences in the array parameters mdadm sets now. Anyway, I see it as the
simplest option.

We can try to start array manually by setting sysfs values, however it will
require well familiarize with mdadm code so would be time consuming.

>>> What can I do to start this array?  
>>   You may try to add them manually. I know that there is
>> --re-add functionality but I've never used it. Maybe something like that
>> would
>> work:
>> #mdadm --remove /dev/md126 <failed drive>
>> #mdadm --re-add /dev/md126 <failed_drive>  
>I tried this but didn't help.

Please provide a logs then (possibly with -vvvvv) maybe I or someone else would
help.

Thanks,
Mariusz