Re: RAID6 12 device assemble force failure

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Wed, 3 Jul 2024 12:16:10 +0200

On Wed, 3 Jul 2024 09:42:53 +0200
Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote:

> On Tue, 2 Jul 2024 19:47:52 +0200
> Adam Niescierowicz <adam.niescierowicz@xxxxxxxxxx> wrote:
> 
> > >>>> What can I do to start this array?    
> > >>>    You may try to add them manually. I know that there is
> > >>> --re-add functionality but I've never used it. Maybe something like that
> > >>> would
> > >>> work:
> > >>> #mdadm --remove /dev/md126 <failed drive>
> > >>> #mdadm --re-add /dev/md126 <failed_drive>    
> > >> I tried this but didn't help.    
> > > Please provide a logs then (possibly with -vvvvv) maybe I or someone else
> > > would help.    
> > 
> > Logs
> > ---
> > 
> > # mdadm --run -vvvvv /dev/md126
> > mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error
> > 
> > # mdadm --stop /dev/md126
> > mdadm: stopped /dev/md126
> > 
> > # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1 
> > /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1 
> > /dev/sdn1 /dev/sdw1 /dev/sdt1
> > mdadm: looking for devices for /dev/md126
> > mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1.
> > mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6.
> > mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11.
> > mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9.
> > mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1.
> > mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7.
> > mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3.
> > mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2.
> > mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4.
> > mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0.
> > mdadm: added /dev/sdv1 to /dev/md126 as 1
> > mdadm: added /dev/sdn1 to /dev/md126 as 2
> > mdadm: added /dev/sdm1 to /dev/md126 as 3
> > mdadm: added /dev/sdw1 to /dev/md126 as 4
> > mdadm: no uptodate device for slot 5 of /dev/md126
> > mdadm: added /dev/sdr1 to /dev/md126 as 6
> > mdadm: added /dev/sds1 to /dev/md126 as 7
> > mdadm: no uptodate device for slot 8 of /dev/md126
> > mdadm: added /dev/sdx1 to /dev/md126 as 9
> > mdadm: no uptodate device for slot 10 of /dev/md126
> > mdadm: added /dev/sdz1 to /dev/md126 as 11
> > mdadm: added /dev/sdq1 to /dev/md126 as -1
> > mdadm: added /dev/sdu1 to /dev/md126 as -1
> > mdadm: added /dev/sdk1 to /dev/md126 as -1
> > mdadm: added /dev/sdt1 to /dev/md126 as 0
> > mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to 
> > start the array.
> > ---  
> 
> Could you please share the logs with from --re-add attempt? In a meantime I
> will try to simulate this scenario.
> > 
> > Can somebody explain me behavior of the array? (theory)
> > 
> > This is RAID-6 so after two disk are disconnected it still works fine. 
> > Next when third disk disconnect the array should stop as faulty, yes?
> > If array stop as faulty the data on array and third disconnected disk 
> > should be the same, yes?  
> 
> If you will recover only one drive (and start double degraded array), it may
> lead to RWH (raid write hole).
> 
> If there were writes during disks failure, we don't know which in flight
> requests completed. The XOR based calculations may leads us to improper
> results for some sectors (we need to read all disks and XOR the data to get
> the data for missing 2 drives).
> 
> But.. if you will add again all disks, in worst case we will read outdated
> data and your filesystem should be able to recover from it.
> 
> So yes, it should be fine if you will start array with all drives.
> 
> Mariusz
> 

I was able to achieve similar state:

mdadm -E /dev/nvme2n1
/dev/nvme2n1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 8fd2cf1a:65a58b8d:0c9a9e2e:4684fb88
           Name : gklab-localhost:my_r6  (local to host gklab-localhost)
  Creation Time : Wed Jul  3 09:43:32 2024
     Raid Level : raid6
   Raid Devices : 4

 Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
     Array Size : 10485760 KiB (10.00 GiB 10.74 GB)
  Used Dev Size : 10485760 sectors (5.00 GiB 5.37 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=1942775216 sectors
          State : clean
    Device UUID : b26bef3c:51813f3f:e0f1a194:c96c4367

    Update Time : Wed Jul  3 11:49:34 2024
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : a96eaa64 - correct
         Events : 6

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : ..A. ('A' == active, '.' == missing, 'R' == replacing)

In my case, events value was different and /dev/nvme3n1 had different Array
State:
 Device Role : Active device 3
   Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing)

And I failed to start it, sorry. It is possible but it requires to work with
sysfs and ioctls directly so much safer is to recreate an array with
--assume-clean, especially that it is fresh array.

Thanks,
Mariusz