2 failed disks RAID 5 behavior bug?

"TJ Harrell" <tj@xxxxxxxxxxxxx> · Sun, 27 Jan 2008 14:44:09 -0500

Hi!

Let me apologize in advance for not having as much information as I'd like
to.

I have a RAID 5 array with 3 elements. Kernel is 2.6.23.

I had a SATA disk fail. On analysis, it's SMART claimed it had an
'electrical failure'. The drive sounded like an angry buzz-saw, so I'm
guessing more was going on with it. Anyway, when the drive failed,
/proc/mdstat showed two drives marked as failed [__U]. The other failed
drive was on the other channel of the same SATA controller. On inspection,
this second drive works fine. I'm guessing somehow the failing drive caused
the SATA controller to lock or something, which caused the RAID layer to
think the second drive was failed.

The problematic behavior is that once two elements were marked as failed,
any read or write access resulted in an "I/O Failure" message.
Unfortunately, I believe some writes were made to the array as the Event
Counter did not match on the two functional elements, and there was quite a
bit of data corruption of the superblock of the FS. 

I'm sorry I don't have more specifics, but I hope perhaps Mr. Brown or
someone else who knows the RAID code will consider making some sort of
safeguard to prevent writing to a RAID 5 array when more than one element is
failed.

PS: Please CC: me. :)

Thank You!
TJ Harrell
systemloc@xxxxxxxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html