Re: RAID6 fallen apart

Neil Brown <neilb@xxxxxxx> · Mon, 28 Aug 2006 12:03:58 +1000

On Saturday August 26, wferi@xxxxxxx wrote:
> Hi,
> 
> after an intermittent network failure, our RAID6 array of AoE devices
> can't run anymore.  Looks like the system dropped each of the disks one
> after the other, and at the third the array failed as expected.
> Trying to assemble the array results in all disks going into spare
> status, nothing useful.  The disks really must have been cut
> simultaneously, but their superblocks were probably altered since then
> by the recovery attempts.
> 
> Can anybody suggest a possible way out?  I'm thinking like restoring
> all the superblocks into a clean state, starting the array, checking
> the filesystem and doing a full copy of it, but don't know how to
> restore the superblocks.  Or am I mistaken?

You say some of the drives are 'spare'.  How did that happen?  Did you
try to add them back to the array after it has failed?  That is a
mistake.
The thing to do at that point is 
  - stop the array
  - make sure the network is back and the individual drives are
    working
  - use mdadm to assemble with --force.  This should 'just work'. 

But if you used --add, then you will have destroyed info in the
superblock.  That isn't the end of the world, but makes it a little
harder.

The easiest thing to do is simply recreate the array, making sure to
have the drives in the correct order, and any options (like chunk
size) the same.  This will not hurt the data (if done correctly).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html