Dear List, I use MD RAID 5 since some years and so far had to recover from single disk failures a few times which was always successful. Now though, I am puzzled. Setup: Some PC with 3x WD 1 TB SATA disk drives set up as RAID 5 using kernel 2.6.27.21 (now); the array ran fine for at least 6 months now. I check the state of the RAID every few days with looking at /proc/mdstat manually. Apparently one drive had been kicked out of the array 4 days ago without me noticing it. Root cause seemed to be bad cabling but is not confirmed yet. Anyway, the disc in question ("sde") reports 23 UDMA_CRC errors, compared to 0 about 2 weeks ago. Reading the complete device just now via DD still reports those 23 errors but no new ones. Well, RAID 5 should survive a single disc failure (again) but after a reboot (due to non-RAID related reasons) the RAID came up as "md0 stopped". cat /proc/mdstat Personalities : md0 : inactive sdc1[1](S) sdd1[2](S) sde1[0](S) 2930279424 blocks unused devices: <none> What's that? First, documentation on the web is rather outdated and/or incomplete. Second, my guess that "(S)" represents a spare is backuped up by the kernel source. mdadm --examine [devices] gives consistent reports about the RAID 5 structure as: Magic : a92b4efc Version : 0.90.00 UUID : ec4fdb7b:e57733c0:4dc42c07:36d99219 Creation Time : Wed Dec 24 11:40:29 2008 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 1953519616 (1863.02 GiB 2000.40 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 ... Layout : left-symmetric Chunk Size : 256K The state though differs: sdc1: Update Time : Tue Apr 7 20:51:33 2009 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : ccff6a15 - correct Events : 177920 ... Number Major Minor RaidDevice State this 1 8 33 1 active sync /dev/sdc1 0 0 0 0 0 removed 1 1 8 33 1 active sync /dev/sdc1 2 2 8 49 2 active sync /dev/sdd1 sdd1: Update Time : Tue Apr 7 20:51:33 2009 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : ccff6a27 - correct Events : 177920 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 2 8 49 2 active sync /dev/sdd1 0 0 0 0 0 removed 1 1 8 33 1 active sync /dev/sdc1 2 2 8 49 2 active sync /dev/sdd1 sde1: Update Time : Fri Apr 3 15:00:31 2009 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : ccf463ec - correct Events : 7 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 0 8 65 0 active sync /dev/sde1 0 0 8 65 0 active sync /dev/sde1 1 1 8 33 1 active sync /dev/sdc1 2 2 8 49 2 active sync /dev/sdd1 sde is the device that failed once and was kicked out of the array. The update time reflects that if I interprete that right. But how can sde1 status claim 3 active and working devices? IMO that's way off. Now, my assumption: I think I should be able to either remove sde temporarily and just restart the degraded array from sdc1/sdd1. correct? My backup is a few days old and I would really like to keep the work on the RAID done in the meantime. If the answer is just 2 or 3 mdadm command lines, I am yours :-) Best regards Frank Baumgart -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html