On Saturday October 13, alberto@xxxxxxxxx wrote: > Over the past several months I have encountered 3 > cases where the software RAID didn't work in keeping > the servers up and running. > > In all cases, the failure has been on a single drive, > yet the whole md device and server become unresponsive. > > (usb-storage) > In one situation a RAID 0 across 2 USB drives failed > when one of the drives accidentally got turned off. RAID0 is not true RAID - there is no redundancy. If one device in a RAID0 fails, the whole array will fail. This is expected. > > (sata) > A second case a disk started generating reports like: > end_request: I/O error, dev sdb, sector 42644555 So the drive had errors - not uncommon. What happened to the array? > > (sata) > The third case (which I'm living right now) is a disk > that I can see during the boot process but that I can't > get operations on it to come back (ie. fdisk -l /dev/sdc). You mean "fdisk -l /dev/sdc" just hangs? That sounds like a SATA driver error. You should report it to the SATA developers linux-ide@xxxxxxxxxxxxxxx md/RAID cannot compensate for problems in the driver code. It expects every request that it sends down to either succeed or fail in a reasonable amount of time. > > (pata) > I have had at least 4 situations on old servers based > on pata disks where disk failures where successful in > being flagged and arrays where degraded automatically. Good! > > So, this is all making me wonder under what circumstances > software RAID may have problems detecting disk failures. RAID1, RAID10, RAID4, RAID5, RAID6 will handle errors that are correctly reported by the underlying device. > > I need to come up with a best practices solution and also > need to understand more as I move into raid over local > network (ie. iscsi, AoE or NBD). Could a disk failure in > one of the servers or a server going offline bring the > whole array down? It shouldn't, providing the low level driver is functioning correctly, and providing you are using true RAID (not RAID0 or LINEAR). NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html