I found an interesting problem with software RAID 5 in 2.6.10:
I have a RAID 5 array, recently created with mdadm. It consists of 4 160 GB drives plus a spare. All 4 drives were active and fully synced when the box locked up due to some sort of hardware problem. When I rebooted, the kernel refused to start the array because all 4 drives had an older timestamp then the spare. So the RAID code kicked them out, one after another, until it was left with just a single spare disk. Since it can't start an array with 0/4 disks, it failed. I was able to repeat this with 2.6.10 and 2.6.2 (the only other kernel I had handy). Pulling the spare disk and rebooting fixed everything.
I don't have an record of the logs during this period--the box was in single-user mode with disk problems, and I didn't want to write anything to the disk.
Logically, it seems like the kernel's RAID recovery code shouldn't look for the newest disk, it should really look for a quorum, even if that means kicking out newer timestamps. *Especially* when the newer timestamp is the spare disk.
Scott
- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html