Michael Evans <mjevans1983@xxxxxxxxx> writes: > On Mon, Dec 21, 2009 at 4:41 AM, Goswin von Brederlow <goswin-v-b@xxxxxx> wrote: >> "Tirumala Reddy Marri" <tmarri@xxxxxxxx> writes: >> >>> Thanks for the response. >>> >>>>> Also as soon as disk failed md drivers marks that drive as faulty >>> and >>>>> continue operation in degraded mode right ? Is there a way to get out >>> >>>>> the degraded mode without adding spare drive. Assuming we have 5 disk >>> >>>>> system with one failed drive. >>>>> >>>>I'm not sure what you want to happen here. The only way to get out of >>> degraded mode is to replace the drive in the >array (if it's not >>> actually faulty then you can add it back, otherwise you need to add a >>> new drive). >>>>What were you thinking might happen otherwise? >>> >>> >>> I was thinking we can recover from this using re-sync or resize .After >> >> Theoretically you could shrink the array by one disk and then use that >> spare disk to resync the parity. But that is a lengthy process with a >> lot higher failure chance than resyncing to a new disk. Note that you >> also need to shrink the filesystem on the raid first adding even more >> stress and failure chance. So I really wouldn't recommend that. >> >>> running IO to degraded (RAID-5) /dev/md0, I am seeing an issue where >>> e2fsck reports inconsistent file system and corrects it. I am trying to >>> debug to see if the issue is because of data not being written or >>> reading wrong data in degraded mode. >>> >>> I guess problem happening during the write. Reason is , after ran e2fsck >>> I don't see inconsistency any more. >>> >>> Regards, >>> Marri >> >> A degraded raid5 might get corrupted if your system crashes. If you >> are writing to one of the remaining disks then it also needs to update >> the parity block simultaneously. If it crashed between writing the >> data and the parity then the data block on the failed drive will >> appear changed. I'm not sure though if the raid will even assemble on >> its own in such a case though. It might just complain about not having >> enough in-sync disks. >> >> Apart from that there should never be any corruption unless one of >> your disks returns bad data on read. >> >> MfG >> Goswin >> >> PS: This is not a bug in linux raid but a fundamental limitation of >> raid. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > You're forgetting the every horrid possibility of failed/corrupted > hardware. I've had IO cards go bad due to a prior bug that let an > experimental 'debugging' option in the kernel write to random memory > locations in the rare case of an unusual error. Not just the > occasional rare chance of a buffer being corrupted, but the actual > hardware going bad. One of the cards could not even be recovered by > an attempt at software-flashing the firmware (it must have been too > far gone for the utility to recognize, and replacing it was the least > expensive route remaining). > > However in general I've seen hardware that's actually failing will > tend to do so with enough grace to either outright refuse to operate, > or operate with obvious and persistent symptoms. And how is that relevant to the raid-5 being degraded? If the hardware goes bad you just get errors no matter what. MfG Goswin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html