Re: Recommendations needed for RAID5 recovery

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Sun, 26 Jun 2016 22:12:15 +0100

On 25/06/16 17:49, Phil Turmel wrote:
> Hi Wol, Peter,
> 
> { Convention on kernel.org is to reply-to-all, bottom or interleave
> replies, and trim unnecessary context.  CC list fixed up accordingly. }

Sorry, but the OP had already been trimmed, I trimmed a bit further...
> 
> On 06/25/2016 07:43 AM, Wols Lists wrote:
> 
>> I know you're getting conflicting advice, but I'd try to get a good dd
>> backup first. I don't know of any utility that will do an md integrity
>> check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ...
> 
> Conflicting advice indeed.  More conflict ahead:
> 
> dd is totally useless for raid recovery in all cases.  ddrescue may be
> of use in this case:

And if dd gets a copy without errors, what's the difference between that
and a ddrescue? Surely they're identical?

That said, it struck me you're probably better off using ddrescue,
because ddrescue could get that copy in one. So if you can get it in
one, it doesn't matter which you use, so you should use ddrescue because
it saves a wasted attempt with dd. (I've just read the ddrescue man
page. Recommended reading ... :-)

> 
> Which means that the the balance of the drives have no redundancy
> available to reconstruct data for any UREs remaining in the array.  If
> there were, forced assembly of originals after any timeout mismatch
> fixes would be the correct solution.  That would let remaining
> redundancy fix UREs while adding more redundancy (the #1 reason for
> choosing raid6 over raid5).
> 
> Peter, I strongly recommend that you perform a forced assembly on the
> three drives, omitting the unit kicked out last year.  (After fixing any
> timeout issue, if any.  Very likely, btw.)  Mount the filesystem
> read-only and backup the absolutely critical items.  Do not use fsck
> yet.  You may encounter UREs that causes some of these copies to fail,
> letting you know which files to not trust later.  If you encounter
> enough failures to drop the array again, simply repeat the forced
> assembly and readonly mount and carry on.
> 
> When you've gotten all you can that way, shut down the array and use
> ddrescue to duplicate all three drives.  Take the originals out of the
> box, and force assemble the new drives.  Run fsck to fix any remaining
> errors from zeroed blocks, then mount and backup anything else you need.
> 
> If you need to keep costs down, it would be fairly low risk to just
> ddrescue the most recent failure onto the oldest (which will write over
> any UREs it currently has).  Then forced assemble with it instead.
> 
> And add a drive to the array to get back to a redundant operation.
> Consider adding another drive after that and reshaping to raid6.  If
> your drives really are ok (timeout issue, not physical), then you could
> re-use one or more of the originals to get back to full operation.  Use
> --zero-superblock on them to allow MD to use them again.
> 
Hmm...

Would it be an idea to get 4 by 3TB drives? That way he can do the
backup straight on to a RAID6 array, and IF he gets a successful backup
then the old drives are now redundant, for backups or whatever (3TB Reds
or NASs are about £100 each...)

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html