Re: Replacing all disks in a an array as a preventative measure before failing.

Wol <antlists@xxxxxxxxxxxxxxx> · Mon, 7 Feb 2022 22:28:57 +0000

On 07/02/2022 20:26, Red Wil wrote:
Hello,

It started as the subject said:
  - goal was to replace all 10 disks in a R6
  - context and perceived constraints
    - soft raid (no imsm and or ddl containers)
    - multiple disk partition. partitions across 10 disks formed R6
    - downtime not an issue
    - minimize the number of commands
    - minimize disks stress
    - reduce the time spent with this process
    - difficult to add 10 spares at once in the rig
    - after a reshape/grow from 6 to 10 disks offset of data in raid
      members was all over the place from cca 10ksect to 200ksect

Approaches/solutions and critique
  1- add one by one a 'spare' and 'replace' raid member
   critique:
   - seem to me long and tedious process
   - cannot/will not run in parallel

There's not a problem running in parallel as far as mdraid is concerned. 
If you can get the spare drives into the chassis (or on eSATA), you can 
--replace several drives at once.

And it pretty much just does a dd, just on the live system keeping you 
raid-safe.

  2- add all the spares at once and perform 'replace' on members
   critique
   - just tedious - lots of cli commands which can be prone to mistakes.

pretty much the same as (1). Given that your sdX's are moving all over 
the place, I would work with uuids even though it's more typing, it's safer.

  next ones assume I have all the 'spares' in the rig
  3- create new arrays on spares, fresh fs and copy data.

Well, you could fail/replace all the old drives, but yes just building a 
new array from scratch (if you can afford the downtime) is probably better.

  4- dd/ddrescue copy each drive to a new one. Advantage can be done one
  by one or in parallel. less commands in the terminal.

Less commands? Dunno about that. Much safer in many ways though, remove 
the drive you're replacing, copy it, put the new one back. Less chance 
for a physical error.

In the end I decided I will use route (3).
  - flexibility on creation
  - copy only what I need
  - old array is a sort of backup

Question:
Just for my curiosity regarding (4) assuming array is offline:
Besides being not recommended in case of imsm/ddl containers which (as
far as i understood) keep some data on the hardware itself

In case of pure soft raid is anything technical or safety related that
prevents a 'dd' copy of a physical hard drive to act exactly as the
original.

Nope. You've copied the partition byte for byte, the raid won't know any 
different.

One question, though. Why are you replacing the drives? Just a precaution?

How big are the drives? What I'd do if you're not replacing dying 
drives, is buy five or possibly six drives of twice the capacity. Do a 
--replace on those five drives. Now take two of the drives you've 
removed, raid-0 them, and now do a major re-org, adding your raid-0 as 
device 6, reducing your raid to a 6-device array, and removing the last 
four old drives from the array. Assuming you've only got 10 bays and 
you've been faffing about externally as you replace drives, you can now 
use the last three drives in the chassis to create another two-drive 
raid-0, add that as a spare into your raid-6, and add your last drive as 
a spare into both your raid-0s.

So you end up with a 6-device+plus-spare raid-6, and devices 6 & spare 
(your raid-0s) share a spare between them.

Cheers,
Wol