Re: RAID10 failure(s)

Mark Keisler <grimm26@xxxxxxxxx> · Mon, 14 Feb 2011 17:08:45 -0600



On Mon, Feb 14, 2011 at 4:48 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Mon, 14 Feb 2011 14:33:03 -0600 Mark Keisler <grimm26@xxxxxxxxx> wrote:
>
>> Sorry for the double-post on the original.
>> I realize that I also left out the fact that I rebooted since drive 0
>> also reported a fault and mdadm won't start the array at all.  I'm not
>> sure how to tell which members were the in two RAID0 groups.  I would
>> think that if I have a RAID0 pair left from the RAID10, I should be
>> able to recover somehow.  Not sure if that was drive 0 and 2, 1 and 3
>> or 0 and 1, 2 and 3.
>>
>> Anyway, the drives do still show the correct array UUID when queried
>> with mdadm -E, but they disagree about the state of the array:
>> # mdadm -E /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 | grep 'Array State'
>>    Array State : AAAA ('A' == active, '.' == missing)
>>    Array State : .AAA ('A' == active, '.' == missing)
>>    Array State : ..AA ('A' == active, '.' == missing)
>>    Array State : ..AA ('A' == active, '.' == missing)
>>
>> sdc still shows a recovery offset, too:
>>
>> /dev/sdb1:
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>> /dev/sdc1:
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>> Recovery Offset : 2 sectors
>> /dev/sdd1:
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>> /dev/sde1:
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>
>> I did some searching on the "READ FPDMA QUEUED" error message that my
>> drive was reporting and have found that there seems to be a
>> correlation between that and having AHCI (NCQ in particular) enabled.
>> I've now set my BIOS back to Native IDE (which was the default anyway)
>> instead of AHCI for the SATA setting.  I'm hoping that was the issue.
>>
>> Still wondering if there is some magic to be done to get at my data again :)
>
> No need for magic here .. but you better stand back, as
>  I'm going to try ... Science.
> (or is that Engineering...)
>
>  mdadm -S /dev/md0
>  mdadm -C /dev/md0 -l10 -n4 -c256 missing /dev/sdc1 /dev/sdd1 /dev/sde1
>  mdadm --wait /dev/md0
>  mdadm /dev/md0 --add /dev/sdb1
>
> (but be really sure that the devices really are working before you try this).
>
> BTW, for a near=2, Raid-disks=4 arrangement, the first and second devices
> contain the same data, and the third and fourth devices also container the
> same data as each other (but obviously different to the first and second).
>
> NeilBrown
>
>
Ah, that's the kind of info that I was looking for.  So, the third and
fourth disks are a complete RAID0 set and the entire RAID10 should be
able to rebuild from them if I replace the first two disks with new
ones (hence being sure the devices are working)?  Or I need to hope
the originals will hold up to a rebuild?

Thanks for the info, Neil, and all your work in FOSS :)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html