Re: RAID6 12 device assemble force failure

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Fri, 5 Jul 2024 13:02:29 +0200

On Thu, 4 Jul 2024 14:35:26 +0200
Adam Niescierowicz <adam.niescierowicz@xxxxxxxxxx> wrote:

> On 4.07.2024 o 13:06, Mariusz Tkaczyk wrote:
> >> Data that can't be store on the foulty device should be keep in the bitmap.
> >> Next when we reatach missing third drive when we write missing data from
> >> bitmap to disk everything should be good, yes?
> >>
> >> I'm thinking correctly?
> >>  
> > Bitmap doesn't record writes. Please read:
> > https://man7.org/linux/man-pages/man4/md.4.html
> > bitmap is used to optimize resync and recovery in case of re-add (but we
> > know that it won't work in your case).  
> 
> Is there a way to make storage more fault tolerant?
> 
>  From what I saw till now one array=one PV(LVM)=LV(LVM)=one FS.
> 
> Mixing two array in LVM and FS isn't good practice.

I don't have expertise to advice about FS and LVM.

We (MD) can offer you RAID6 RAID1 and RAID10 so please choose wisely what fits
best your needs. RAID1 is the best fault tolerant but capacity is the lowest.

> 
> 
> But what about raid configuration?
> I have 4 external backplane, 12 disk each. Each backplane is attached by 
> external four SAS LUNs.
> In scenario where I attache three disk to one LUN and one LUN crash or 
> hang and next restart or ... data on the array will be damaged, yes?

Yes, that could be. RAID6 cannot save you from that. It has up to 2 failure
tolerance, not more. That is why backups are important.

Leading array to failed state may cause data damage, any recover from something
like that is recover from error scenario so data might be damaged. I cannot say
yes or no because it varies. Generally, you should be always ready for the
worst case.

We *should* not record failed state in metadata to give user chance to recover
from such scenario so I don't get why it happened (maybe a bug?). I will try to
find time to work on it in next weeks.

> 
> I think that I can create raid5 array for three disk in one LUN so when 
> LUN freeze, disconect, hungs or etc one array will stop like server 
> crash without power and this should be recovable(until now I didn't have 
> problem with array rebuild in this kind of situation)

We cannot record any failure because we lost all drives at the same moment. It
is kind of workaround, it will save you from going to failed or degraded
state. There could be still filesystem error but probably correctable (if array
is wasn't degraded, otherwise RWH may happen).

> 
> Problem is with disk usage, each 12 pcs backplane will use 4 disk for 
> parity( 12 disk=4 luns = 4 raid 5 array).
> 
> Is there better way to do this?

It depends what do you mean by better :) This is always the compromise between
performance, capacity and redundancy. If you are satisfied with raid5
performance, and you think that the redundancy offered by this approach is
enough for your needs- this is fine. If you need more fault tolerant array (or
arrays)- please consider raid1 and raid10.

> 
> 
> > And I failed to start it, sorry. It is possible but it requires to 
> > work with  
> >>> sysfs and ioctls directly so much safer is to recreate an array with
> >>> --assume-clean, especially that it is fresh array.  
> >> I recreated the array, LVM detected PV and works fine but XFS above the
> >> LVM is missing data from recreate array.
> >>  
> > Well, it looks like you did it right because LVM is up. Please compare if
> > disks are ordered same way in new array (indexes of the drives in mdadm -D
> > output). Just do be double sure.  
> 
> How can I assigne raid disk number to each disk?
> 
> 

Order in create command matters. You must pass devices in same order as they
were, starting from the lowest one i.e.:
mdadm -CR volume -n 12 /dev/disk1 /dev/disk2 ...

If you are using bask completion please be aware that it may order them
differently.

Mariusz