Re: RAID5 with 2 drive failure at the same time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Feb 1, 2013, at 6:34 AM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
> It'd also be useful to know whether sdg has been rewritten at
> all since then (i.e. whether the testing was destructive or not), and
> whether or not the array was written to at all since the failure of sdg.

OP needs to reply back.

Also I'd like to know what model disks these are, if they're AF or not.

>> Yes, if sdg still contains valid array data (and the array wasn't
> written since then) then it would definitely make more sense to recreate
> the array using it, leaving sdj out for now. That'll require more work
> checking mdadm versions and data offset values though. That'll avoid the
> issues with the unreadable blocks on sdj.

Here's an idea. One possibility is to use dd to read the sector on sdg1 that error1.txt reported with the write error, to a file, and see if there's a read error. If not, rewrite that data back to the same sector and see if there's a write error. If not, attempt to force assemble assume clean, get the array up in degraded mode, and do a non-destructive fsck. If that's OK, just take a backup immediately. Then sdj can be destructively written to, to force bad sectors there to be removed for reserves, but still needs a smart extended offline test to confirm; and then possibly reused and rebuilt.

> I prefer badblocks myself - I can see exactly what it's doing and what
> errors are seen. With secure erase you're dependent on the firmware
> internals to tell you what's actually going on (and, depending on the
> nature of the errors you're getting, this may already be suspect).

The firmware is always a go between, you can't actually get around it. Bad sectors are entirely the domain of the drive firmware so for that purpose I don't see an advantage of an external program over secure erase and SMART testing. If it lies on either of those, it'll lie to badblocks.

Where I can see the usefulness of badblocks, maybe not more so than other tools, is it would show non-disk related errors like UDMA/CRC errors related to controller or cable problems. Whereas the entire duration of secure erase and smart testing is strictly internal to the drive.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux