Re: Failed Array Rebuild advice Please

jahammonds prost <gmitch64@xxxxxxxxx> · Tue, 10 Apr 2012 16:46:32 -0700 (PDT)

Neil,
    Thanks for the fast reply.

>    If you trust that sdd and sdm really are working now, you can
>
>    mdadm /dev/md0 --add /dev/sdd1 /dev/sdm1

I am running a read only badblocks on them at the moment, but I don't really trust them. Once the scan has finished, I am going to replace them with a couple of new drives, and once I've run a full destructive badblocks on those (so that's going to take overnight), I will add them into the array. Thanks for the tip on having the rebuild run in parallel, that should speed up the rebuild process. 

Once all that is done, I will run a destructive badblocks on the original failed drives, and if they still pass, I will add them as additional drives into the array. This box is mainly a media server - how many drives would you suggest is the max in a RAID6 setup?

YP.

----- Original Message -----
From: NeilBrown <neilb@xxxxxxx>
To: jahammonds prost <gmitch64@xxxxxxxxx>
Cc: Linux RAID <linux-raid@xxxxxxxxxxxxxxx>
Sent: Tuesday, 10 April 2012, 19:02
Subject: Re: Failed Array Rebuild advice Please

On Tue, 10 Apr 2012 15:32:44 -0700 (PDT) jahammonds prost
<gmitch64@xxxxxxxxx> wrote:

> Since I know I did nothing with the temp one drive array when the server was booted (and I don't think that the md code did anything either??) would it be safe to 
>  
> mdadm --assemble /dev/md0 /dev/sd[a-c]1 /dev/sd[e-h]1 /dev/sd[j-l]1 /dev/sd[n-p]1 --force
>  
> to let the array come back up and get it running?

Yes.   Though you don't need to exclude the oldest devices.  mdadm will
figure out which ones to use and which ones to avoid.

>  
> What would then be the correct sequence to replace the 2 failed drives (sdd1 and sdm1) and get the array running fully again?

If you trust that sdd and sdm really are working now, you can

mdadm /dev/md0 --add /dev/sdd1 /dev/sdm1

This will probably start rebuilding just one of the devices. You can check
with
   mdadm -D /dev/md0
which will report "spare building" against one or more devices.
If it is only rebuilding one of them, then

  echo idle > /sys/block/md0/md/sync_action

will cause it to stop and restart the recovery.  It will then recovery both
devices in parallel.

(Next version of mdadm will do this automatically).

NeilBrown

>  
>  
> Thanks for your help.
>  
>  
> YP.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html