Re: 4 partition raid 5 with 2 disks active and 2 spare, how to force?

Michael Evans <mjevans1983@xxxxxxxxx> · Sun, 28 Mar 2010 23:41:52 -0700

On Sun, Mar 28, 2010 at 10:32 PM, Luca Berra <bluca@xxxxxxxxxx> wrote:
> On Sun, Mar 28, 2010 at 10:05:58PM +0530, Anshuman Aggarwal wrote:
>>>
>>> Michael,
>>> I am running mdadm 3.1.2 (latest stable I think) compiled from source
>>> (FYI on Ubuntu Karmic, 2.6.31-20-generic)
>>>
>>> Here is what happened....the device /dev/sda1 has failed once, but I was
>>> wondering if it was a freak accident so I tried adding it back..and then it
>>> started resyncing ...somewhere in this process...the disk /dev/sda1 stalled
>>> and the server needed a reboot. After that boot, I got 2 spares (/dev/sda1,
>>> /dev/sdd5) and 2 active devices (/dev/sdb1, /dev/sdc1)
>>>
>>> Maybe I need to do a build with a --assume-clean with the devices in the
>>> right order (which I'm positive I can remember) ...be nice if you could plz
>>> double check:
>>> mdadm --build -n 4 -l 5 -e1.2 --assume-clean /dev/md127 /dev/sda1
>>> /dev/sdb5 /dev/sdc5 /dev/sdd5
>>>
>>> Again, thanks for your time...
>>>
>>> John,
>>> I did try what you said without any luck(--assemble --force but it
>>> refuses to accept the spare as a valid device and 2 active on a 4 member
>>> device isn't good enough)
>>>
>>>
>>>
>>
>> Some more info:
>>
>> I did try this command with the following result:
>>
>> mdadm --build -n 4 -l 5 -e1.2 --assume-clean /dev/md127 /dev/sda1
>> /dev/sdb5 /dev/sdc5 /dev/sdd5
>> mdadm: Raid level 5 not permitted with --build.
>>
>> Should I try this?
>> mdadm --create -n 4 -l 5 -e1.2 --assume-clean /dev/md127 /dev/sda1
>> /dev/sdb5 /dev/sdc5 /dev/sdd5
>
> From your description above /dev/sda was the failed one, so you should
> not add it to the array. use the word "missing" in its place.
>
> L.
>
> --
> Luca Berra -- bluca@xxxxxxxxxx
>        Communication Media & Services S.r.l.
>  /"\
>  \ /     ASCII RIBBON CAMPAIGN
>  X        AGAINST HTML MAIL
>  / \
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Additionally to using missing for the device you know to be a failed
one, VERY highly suggest running a check, or some read-only operation
on the resulting raid device to make sure you can read all of the
data.  Be sure to check the dmesg/system logs to make sure that there
were no noted storage errors.  If there were not, it is /probably/
safe to re-add the previously failed disk and resync it.

While checking that your array data can be read, you should probably
also run the SMART tests via smartctl (or a gui for it) on the
'failed' disk to see if it was a sign of something worse.

In any case, I do NOT recommend using anything within the raid
container other than in read-only mode until the resync is complete.
You may need to use portions of sda that are still good in more
elaborate ways to recover data that is readable there, but not
readable on sdd or other drives.  Read/write mode or even FSCK on the
array contents will only increase the chances of data being out of
sync.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html