Re: Replacing all disks in a an array as a preventative measure before failing.

Red Wil <redwil@xxxxxxxxx> · Wed, 9 Feb 2022 15:58:25 -0500



On Mon, 7 Feb 2022 22:28:57 +0000
Wol <antlists@xxxxxxxxxxxxxxx> wrote:

> On 07/02/2022 20:26, Red Wil wrote:
> > Hello,
> > 
> > It started as the subject said:
> >   - goal was to replace all 10 disks in a R6
> >   - context and perceived constraints
> >     - soft raid (no imsm and or ddl containers)
> >     - multiple disk partition. partitions across 10 disks formed R6
> >     - downtime not an issue
> >     - minimize the number of commands
> >     - minimize disks stress
> >     - reduce the time spent with this process
> >     - difficult to add 10 spares at once in the rig
> >     - after a reshape/grow from 6 to 10 disks offset of data in raid
> >       members was all over the place from cca 10ksect to 200ksect
> > 
> > Approaches/solutions and critique
> >   1- add one by one a 'spare' and 'replace' raid member
> >    critique:
> >    - seem to me long and tedious process
> >    - cannot/will not run in parallel  
> 
> There's not a problem running in parallel as far as mdraid is
> concerned. If you can get the spare drives into the chassis (or on
> eSATA), you can --replace several drives at once.
> 
> And it pretty much just does a dd, just on the live system keeping
> you raid-safe.
If I remember correctly if you have multiple partitions on a single
disk (different arrays obviously) if you start a syn/resync op, for
example, on all arrays from that particular spindle/disk, it will be
done sequentially. If it would do it in parallel -> heads movement
stress.
> 
> >   2- add all the spares at once and perform 'replace' on members
> >    critique
> >    - just tedious - lots of cli commands which can be prone to
> > mistakes.  
> 
> pretty much the same as (1). Given that your sdX's are moving all
> over the place, I would work with uuids even though it's more typing,
> it's safer.
> 
> >   next ones assume I have all the 'spares' in the rig
> >   3- create new arrays on spares, fresh fs and copy data.  
> 
> Well, you could fail/replace all the old drives, but yes just
> building a new array from scratch (if you can afford the downtime) is
> probably better.
Another reason to go this route was to tune/tweak the stack
(RAID-LVM-FS)
> 
> >   4- dd/ddrescue copy each drive to a new one. Advantage can be
> > done one by one or in parallel. less commands in the terminal.  
> 
> Less commands? Dunno about that. Much safer in many ways though,
> remove the drive you're replacing, copy it, put the new one back.
> Less chance for a physical error.
well.. it's a matter of perception. for 10 disks I will have 10 dd
commands of the form "dd if=olddrive of=newdrive <some params>" or even
better "ddrescue olddrive newdrive logfile" otherwise all the "mdadm
commands" would be 50 in total for 10 disks for I have 5 individual
arrays across 10 disks
> > 
> > In the end I decided I will use route (3).
> >   - flexibility on creation
> >   - copy only what I need
> >   - old array is a sort of backup
> > 
> > Question:
> > Just for my curiosity regarding (4) assuming array is offline:
> > Besides being not recommended in case of imsm/ddl containers which
> > (as far as i understood) keep some data on the hardware itself
> > 
> > In case of pure soft raid is anything technical or safety related
> > that prevents a 'dd' copy of a physical hard drive to act exactly
> > as the original.
> >   
> Nope. You've copied the partition byte for byte, the raid won't know
> any different.
> 
> One question, though. Why are you replacing the drives? Just a
> precaution?
> 
> How big are the drives? What I'd do if you're not replacing dying 
> drives, is buy five or possibly six drives of twice the capacity. Do
> a --replace on those five drives. Now take two of the drives you've 
> removed, raid-0 them, and now do a major re-org, adding your raid-0
> as device 6, reducing your raid to a 6-device array, and removing the
> last four old drives from the array. Assuming you've only got 10 bays
> and you've been faffing about externally as you replace drives, you
> can now use the last three drives in the chassis to create another
> two-drive raid-0, add that as a spare into your raid-6, and add your
> last drive as a spare into both your raid-0s.
> 
> So you end up with a 6-device+plus-spare raid-6, and devices 6 &
> spare (your raid-0s) share a spare between them.
> 
> Cheers,
> Wol
I was thinking of cutting nr. of drives to 6 from 10 by using double
size drives but financial considerations at the time end up with 10
slightly larger drives.

Thanks for your comments
Red