Re: RAID10 failure(s)

Mark Keisler <grimm26@xxxxxxxxx> · Tue, 15 Feb 2011 11:47:53 -0600



On Mon, Feb 14, 2011 at 6:57 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Mon, 14 Feb 2011 18:49:03 -0600 Mark Keisler <grimm26@xxxxxxxxx> wrote:
>
>> Oh, duh, was thinking in 0+1 instead of 10.  I'm still wondering why
>> you made mention of "but be really sure that the devices really are
>> working before you try this."  If trying to bring the RAID back fails,
>> I'm just back to not having access to the data which is where I am now
>> :).
>
> If you try reconstructing the array before you are sure you have resolved the
> original problem (be it BIOS setting, bad cables, dodgey controller or even a
> bad disk drive) then you risk compounding your problems and at least are
> likely to waste time.
> Sometimes people are in such a hurry to get access to their data that they
> cut corners to their detriment.  I don't know if you are such a person, but
> I mentioned it anyway just in case.
>
> NeilBrown
>
>
After checking things over, SMART tests were showing quite a few
Offline_Uncorrectable and  a high Current_Pending_Sector count on the
two drives that had failed out of the array.  So, based on that, I
figured I had nothing to lose in trying to create the array again.  I
just went with the array in a degraded state with 3 drives and was
able to activate the volumes on it and get the part of the data off
that wasn't backed up yet before it failed again.

Stan's dd zero idea also confirms with its output and logs:
 # dd if=/dev/zero of=/dev/sdb
dd: writing to `/dev/sdb': Input/output error
9368201+0 records in
9368200+0 records out
4796518400 bytes (4.8 GB) copied, 229.128 s, 20.9 MB/s


So, RMA of drives, keep smartd running, rebuild the array, load some
data and monitor :).  Thanks for the help guys.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html