Re: RAID6 12 device assemble force failure

Adam Niescierowicz <adam.niescierowicz@xxxxxxxxxx> · Tue, 2 Jul 2024 19:47:52 +0200

    On 2.07.2024 o 10:47, Mariusz Tkaczyk
      wrote:

      On Mon, 1 Jul 2024 11:33:16 +0200
Adam Niescierowicz <adam.niescierowicz@xxxxxxxxxx> wrote:

        Is there a way to force state=active in the metadata?
 From what I saw each drive have exactly the same Events: 48640 and 
Update Time so data on the drive should be the same.

      The most important: I advice you to clone disks to have a safe space for
practicing. Whatever you will do is risky now, we don't want to make
situation worse. My suggestions might be destructible and I don't want to take
responsibility of making it worse.

    I am aware of the danger. Thank you also for your help.

      We have --dump and --restore functionality, I've never used it
(I mainly IMSM focused) so I can just point you that it is there and it is an
option to clone metadata.

native metadata keep both spares and data in the same array, and we can see
that spare states for those 3 devices are consistently reported on every drive.

It means that at some point metadata with missing disk states updated to
spares  has been written to the all array members (including spares) but it does
not mean that the data is consistent. You are recovering from error scenario and
whatever is there, you need to be read for the worst case.

The brute-force method would be to recreate an array with same startup
parameters and --assume-clean flag but this is risky. Probably your array
was initially created few years (and mdadm versions) so there could be small
differences in the array parameters mdadm sets now. Anyway, I see it as the
simplest option.

    In my situation this is fresh config like 2 weeks and I can
      install exactly the same mdadm version.
    If there will be no other way to process I will try to create
      array --force --assume-clean

      We can try to start array manually by setting sysfs values, however it will
require well familiarize with mdadm code so would be time consuming.

            What can I do to start this array?  

            You may try to add them manually. I know that there is
--re-add functionality but I've never used it. Maybe something like that
would
work:
#mdadm --remove /dev/md126 <failed drive>
#mdadm --re-add /dev/md126 <failed_drive>  

        I tried this but didn't help.

      Please provide a logs then (possibly with -vvvvv) maybe I or someone else would
help.

    Logs

      ---
    # mdadm --run -vvvvv /dev/md126

      mdadm: failed to start array /dev/md/card1pport2chassis1:
      Input/output error

    # mdadm --stop /dev/md126

      mdadm: stopped /dev/md126
    # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1
      /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1
      /dev/sdm1 /dev/sdn1 /dev/sdw1 /dev/sdt1

      mdadm: looking for devices for /dev/md126

      mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.

      mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.

      mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.

      mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.

      mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.

      mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.

      mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.

      mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.

      mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.

      mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.

      mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.

      mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.

      mdadm: added /dev/sdv1 to /dev/md126 as 1

      mdadm: added /dev/sdn1 to /dev/md126 as 2

      mdadm: added /dev/sdm1 to /dev/md126 as 3

      mdadm: added /dev/sdw1 to /dev/md126 as 4

      mdadm: no uptodate device for slot 5 of /dev/md126

      mdadm: added /dev/sdr1 to /dev/md126 as 6

      mdadm: added /dev/sds1 to /dev/md126 as 7

      mdadm: no uptodate device for slot 8 of /dev/md126

      mdadm: added /dev/sdx1 to /dev/md126 as 9

      mdadm: no uptodate device for slot 10 of /dev/md126

      mdadm: added /dev/sdz1 to /dev/md126 as 11

      mdadm: added /dev/sdq1 to /dev/md126 as -1

      mdadm: added /dev/sdu1 to /dev/md126 as -1

      mdadm: added /dev/sdk1 to /dev/md126 as -1

      mdadm: added /dev/sdt1 to /dev/md126 as 0

      mdadm: /dev/md126 assembled from 9 drives and 3 spares - not
      enough to start the array.

      ---

    Can somebody explain me behavior of the array? (theory)

    This is RAID-6 so after two disk are disconnected it still works
      fine. Next when third disk disconnect the array should stop as
      faulty, yes? 

      If array stop as faulty the data on array and third disconnected
      disk should be the same, yes?

    -- 
---
Thanks
Adam Nieścierowicz

begin:vcard
fn;quoted-printable:Adam Nie=C5=9Bcierowicz
n;quoted-printable:Nie=C5=9Bcierowicz;Adam
email;internet:adam.niescierowicz@xxxxxxxxxx
x-mozilla-html:TRUE
version:2.1
end:vcard