On Wed, 3 Jul 2024 09:42:53 +0200 Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > On Tue, 2 Jul 2024 19:47:52 +0200 > Adam Niescierowicz <adam.niescierowicz@xxxxxxxxxx> wrote: > > > >>>> What can I do to start this array? > > >>> You may try to add them manually. I know that there is > > >>> --re-add functionality but I've never used it. Maybe something like that > > >>> would > > >>> work: > > >>> #mdadm --remove /dev/md126 <failed drive> > > >>> #mdadm --re-add /dev/md126 <failed_drive> > > >> I tried this but didn't help. > > > Please provide a logs then (possibly with -vvvvv) maybe I or someone else > > > would help. > > > > Logs > > --- > > > > # mdadm --run -vvvvv /dev/md126 > > mdadm: failed to start array /dev/md/card1pport2chassis1: Input/output error > > > > # mdadm --stop /dev/md126 > > mdadm: stopped /dev/md126 > > > > # mdadm --assemble --force -vvvvv /dev/md126 /dev/sdq1 /dev/sdv1 > > /dev/sdr1 /dev/sdu1 /dev/sdz1 /dev/sdx1 /dev/sdk1 /dev/sds1 /dev/sdm1 > > /dev/sdn1 /dev/sdw1 /dev/sdt1 > > mdadm: looking for devices for /dev/md126 > > mdadm: /dev/sdq1 is identified as a member of /dev/md126, slot -1. > > mdadm: /dev/sdv1 is identified as a member of /dev/md126, slot 1. > > mdadm: /dev/sdr1 is identified as a member of /dev/md126, slot 6. > > mdadm: /dev/sdu1 is identified as a member of /dev/md126, slot -1. > > mdadm: /dev/sdz1 is identified as a member of /dev/md126, slot 11. > > mdadm: /dev/sdx1 is identified as a member of /dev/md126, slot 9. > > mdadm: /dev/sdk1 is identified as a member of /dev/md126, slot -1. > > mdadm: /dev/sds1 is identified as a member of /dev/md126, slot 7. > > mdadm: /dev/sdm1 is identified as a member of /dev/md126, slot 3. > > mdadm: /dev/sdn1 is identified as a member of /dev/md126, slot 2. > > mdadm: /dev/sdw1 is identified as a member of /dev/md126, slot 4. > > mdadm: /dev/sdt1 is identified as a member of /dev/md126, slot 0. > > mdadm: added /dev/sdv1 to /dev/md126 as 1 > > mdadm: added /dev/sdn1 to /dev/md126 as 2 > > mdadm: added /dev/sdm1 to /dev/md126 as 3 > > mdadm: added /dev/sdw1 to /dev/md126 as 4 > > mdadm: no uptodate device for slot 5 of /dev/md126 > > mdadm: added /dev/sdr1 to /dev/md126 as 6 > > mdadm: added /dev/sds1 to /dev/md126 as 7 > > mdadm: no uptodate device for slot 8 of /dev/md126 > > mdadm: added /dev/sdx1 to /dev/md126 as 9 > > mdadm: no uptodate device for slot 10 of /dev/md126 > > mdadm: added /dev/sdz1 to /dev/md126 as 11 > > mdadm: added /dev/sdq1 to /dev/md126 as -1 > > mdadm: added /dev/sdu1 to /dev/md126 as -1 > > mdadm: added /dev/sdk1 to /dev/md126 as -1 > > mdadm: added /dev/sdt1 to /dev/md126 as 0 > > mdadm: /dev/md126 assembled from 9 drives and 3 spares - not enough to > > start the array. > > --- > > Could you please share the logs with from --re-add attempt? In a meantime I > will try to simulate this scenario. > > > > Can somebody explain me behavior of the array? (theory) > > > > This is RAID-6 so after two disk are disconnected it still works fine. > > Next when third disk disconnect the array should stop as faulty, yes? > > If array stop as faulty the data on array and third disconnected disk > > should be the same, yes? > > If you will recover only one drive (and start double degraded array), it may > lead to RWH (raid write hole). > > If there were writes during disks failure, we don't know which in flight > requests completed. The XOR based calculations may leads us to improper > results for some sectors (we need to read all disks and XOR the data to get > the data for missing 2 drives). > > But.. if you will add again all disks, in worst case we will read outdated > data and your filesystem should be able to recover from it. > > So yes, it should be fine if you will start array with all drives. > > Mariusz > I was able to achieve similar state: mdadm -E /dev/nvme2n1 /dev/nvme2n1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 8fd2cf1a:65a58b8d:0c9a9e2e:4684fb88 Name : gklab-localhost:my_r6 (local to host gklab-localhost) Creation Time : Wed Jul 3 09:43:32 2024 Raid Level : raid6 Raid Devices : 4 Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB) Array Size : 10485760 KiB (10.00 GiB 10.74 GB) Used Dev Size : 10485760 sectors (5.00 GiB 5.37 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=1942775216 sectors State : clean Device UUID : b26bef3c:51813f3f:e0f1a194:c96c4367 Update Time : Wed Jul 3 11:49:34 2024 Bad Block Log : 512 entries available at offset 16 sectors Checksum : a96eaa64 - correct Events : 6 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : ..A. ('A' == active, '.' == missing, 'R' == replacing) In my case, events value was different and /dev/nvme3n1 had different Array State: Device Role : Active device 3 Array State : ..AA ('A' == active, '.' == missing, 'R' == replacing) And I failed to start it, sorry. It is possible but it requires to work with sysfs and ioctls directly so much safer is to recreate an array with --assume-clean, especially that it is fresh array. Thanks, Mariusz