Re: RAID-5 degraded mode question

Goswin von Brederlow <goswin-v-b@xxxxxx> · Tue, 22 Dec 2009 14:37:48 +0100

Michael Evans <mjevans1983@xxxxxxxxx> writes:

> On Mon, Dec 21, 2009 at 4:41 AM, Goswin von Brederlow <goswin-v-b@xxxxxx> wrote:
>> "Tirumala Reddy Marri" <tmarri@xxxxxxxx> writes:
>>
>>> Thanks for the response.
>>>
>>>>>  Also as soon as disk failed md drivers marks that drive as faulty
>>> and
>>>>> continue operation in degraded mode right ? Is there a way to get out
>>>
>>>>> the degraded mode without adding spare drive. Assuming we have 5 disk
>>>
>>>>> system with one failed drive.
>>>>>
>>>>I'm not sure what you want to happen here.  The only way to get out of
>>> degraded mode is to replace the drive in the >array (if it's not
>>> actually faulty then you can add it back, otherwise you need to add a
>>> new drive).
>>>>What were you thinking might happen otherwise?
>>>
>>>
>>> I was thinking we can recover from this using re-sync or resize .After
>>
>> Theoretically you could shrink the array by one disk and then use that
>> spare disk to resync the parity. But that is a lengthy process with a
>> lot higher failure chance than resyncing to a new disk. Note that you
>> also need to shrink the filesystem on the raid first adding even more
>> stress and failure chance. So I really wouldn't recommend that.
>>
>>> running IO to degraded (RAID-5) /dev/md0, I am seeing an issue where
>>> e2fsck reports inconsistent file system and corrects it. I am trying to
>>> debug  to see if the issue is because of data not being written or
>>> reading wrong data in degraded mode.
>>>
>>> I guess problem happening during the write. Reason is , after ran e2fsck
>>> I don't see inconsistency any more.
>>>
>>> Regards,
>>> Marri
>>
>> A degraded raid5 might get corrupted if your system crashes. If you
>> are writing to one of the remaining disks then it also needs to update
>> the parity block simultaneously. If it crashed between writing the
>> data and the parity then the data block on the failed drive will
>> appear changed. I'm not sure though if the raid will even assemble on
>> its own in such a case though. It might just complain about not having
>> enough in-sync disks.
>>
>> Apart from that there should never be any corruption unless one of
>> your disks returns bad data on read.
>>
>> MfG
>>        Goswin
>>
>> PS: This is not a bug in linux raid but a fundamental limitation of
>> raid.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> You're forgetting the every horrid possibility of failed/corrupted
> hardware.  I've had IO cards go bad due to a prior bug that let an
> experimental 'debugging' option in the kernel write to random memory
> locations in the rare case of an unusual error.  Not just the
> occasional rare chance of a buffer being corrupted, but the actual
> hardware going bad.  One of the cards could not even be recovered by
> an attempt at software-flashing the firmware (it must have been too
> far gone for the utility to recognize, and replacing it was the least
> expensive route remaining).
>
> However in general I've seen hardware that's actually failing will
> tend to do so with enough grace to either outright refuse to operate,
> or operate with obvious and persistent symptoms.

And how is that relevant to the raid-5 being degraded? If the hardware
goes bad you just get errors no matter what.

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html