Hello, I am wondering how to identify a failed md array. Lets assume the following array /dev/md0: Version : 1.2 Creation Time : Mon May 26 19:10:59 2014 Raid Level : raid1 Array Size : 10176 (9.94 MiB 10.42 MB) Used Dev Size : 10176 (9.94 MiB 10.42 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Mon May 26 19:10:59 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : test:0 (local to host test) UUID : cac8fd48:44219a96:5de7e757:4e21a3e2 Events : 17 Number Major Minor RaidDevice State 0 254 0 0 active sync /dev/dm-0 1 254 1 1 active sync /dev/dm-1 with /sys/block/md0/md/array_state:clean /sys/block/md0/md/dev-dm-0/state:in_sync /sys/block/md0/md/dev-dm-1/state:in_sync and disk0: 0 20480 linear 7:0 0 disk1: 0 20480 linear 7:1 0 If dm-0 gets changed to "disk0: 0 20480 error" and we read from the array (dd if=/dev/md0 count=1 iflag=direct of=/dev/null) the broken disk gets detected by md: [84688.483607] md/raid1:md0: dm-0: rescheduling sector 0 [84688.483654] md/raid1:md0: redirecting sector 0 to other mirror: dm-1 [84688.483670] md: super_written gets error=-5, uptodate=0 [84688.483672] md/raid1:md0: Disk failure on dm-0, disabling device. md/raid1:md0: Operation continuing on 1 devices. [84688.483676] md: super_written gets error=-5, uptodate=0 [84688.494174] RAID1 conf printout: [84688.494178] --- wd:1 rd:2 [84688.494181] disk 0, wo:1, o:0, dev:dm-0 [84688.494182] disk 1, wo:0, o:1, dev:dm-1 [84688.494183] RAID1 conf printout: [84688.494184] --- wd:1 rd:2 [84688.494184] disk 1, wo:0, o:1, dev:dm-1 /dev/md0: Version : 1.2 Creation Time : Mon May 26 19:10:59 2014 Raid Level : raid1 Array Size : 10176 (9.94 MiB 10.42 MB) Used Dev Size : 10176 (9.94 MiB 10.42 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Mon May 26 19:27:41 2014 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Name : test:0 (local to host test) UUID : cac8fd48:44219a96:5de7e757:4e21a3e2 Events : 20 Number Major Minor RaidDevice State 0 0 0 0 removed 1 254 1 1 active sync /dev/dm-1 0 254 0 - faulty /dev/dm-0 md0 : active raid1 dm-1[1] dm-0[0](F) 10176 blocks super 1.2 [2/1] [_U] /sys/block/md0/md/array_state:clean /sys/block/md0/md/dev-dm-0/state:faulty,write_error /sys/block/md0/md/dev-dm-1/state:in_sync /sys/block/md0/md/degraded:1 However if I also change dm-1 to "disk1: 0 20480 error" and read again there is no visible state change: /dev/md0: Version : 1.2 Creation Time : Mon May 26 19:10:59 2014 Raid Level : raid1 Array Size : 10176 (9.94 MiB 10.42 MB) Used Dev Size : 10176 (9.94 MiB 10.42 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Mon May 26 19:27:41 2014 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 removed 1 254 1 1 active sync /dev/dm-1 0 254 0 - faulty /dev/dm-0 md0 : active raid1 dm-1[1] dm-0[0](F) 10176 blocks super 1.2 [2/1] [_U] /sys/block/md0/md/array_state:clean /sys/block/md0/md/dev-dm-0/state:faulty,write_error /sys/block/md0/md/dev-dm-1/state:in_sync /sys/block/md0/md/degraded:1 On write to the array we get [85498.660247] md: super_written gets error=-5, uptodate=0 [85498.666464] quiet_error: 268 callbacks suppressed [85498.666470] Buffer I/O error on device md0, logical block 2528 [85498.666476] Buffer I/O error on device md0, logical block 2528 [85498.666486] Buffer I/O error on device md0, logical block 2542 [85498.666490] Buffer I/O error on device md0, logical block 2542 [85498.666496] Buffer I/O error on device md0, logical block 0 [85498.666499] Buffer I/O error on device md0, logical block 0 [85498.666508] Buffer I/O error on device md0, logical block 1 [85498.666512] Buffer I/O error on device md0, logical block 1 [85498.666518] Buffer I/O error on device md0, logical block 2543 [85498.666524] Buffer I/O error on device md0, logical block 2543 [85498.866388] md: super_written gets error=-5, uptodate=0 and the only change is /sys/block/md0/md/dev-dm-1/state:in_sync,write_error,want_replacement How can I identify a failed array? array_state reports "clean", the last raid member stays "in_sync" and the value in degraded doesn't equal raid_disks. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html