Hello, today I got alerted by mdadm via email that a disk on one of my servers failed. On the machine, I see /dev/sda1 as faulty: Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 0 8 1 - faulty /dev/sda1 and in dmesg: ata1.00: configured for UDMA/133 sd 0:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current] [descriptor] sd 0:0:0:0: [sda] tag#18 Add. Sense: Logical block address out of range sd 0:0:0:0: [sda] tag#18 CDB: Write(16) 8a 00 00 00 00 00 00 06 40 10 00 00 00 08 00 00 blk_update_request: I/O error, dev sda, sector 409616 md: super_written gets error=-5 md/raid1:md0: Disk failure on sda1, disabling device. md/raid1:md0: Operation continuing on 1 devices. Note this is a Write(16) error. However, scrolling up in dmesg, I see lots of Read(16) errors for *both* /dev/sda and /dev/sdb: For sdb, at [7723679.793801]: ata3.00: exception Emask 0x0 SAct 0x7c SErr 0x0 action 0x0 ata3.00: irq_stat 0x40000008 ata3.00: failed command: READ FPDMA QUEUED ata3.00: cmd 60/00:10:00:6e:e4/0a:00:00:00:00/40 tag 2 ncq 1310720 in res 41/40:00:30:73:e4/00:00:00:00:00/40 Emask 0x409 (media error) <F> ata3.00: status: { DRDY ERR } ata3.00: error: { UNC } ata3.00: configured for UDMA/133 sd 2:0:0:0: [sdb] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 2:0:0:0: [sdb] tag#2 Sense Key : Medium Error [current] [descriptor] sd 2:0:0:0: [sdb] tag#2 Add. Sense: Unrecovered read error - auto reallocate failed sd 2:0:0:0: [sdb] tag#2 CDB: Read(16) 88 00 00 00 00 00 00 e4 6e 00 00 00 0a 00 00 00 blk_update_request: I/O error, dev sdb, sector 14971696 ata3: EH complete For sda, at [7723688.533758]: ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0 ata1.00: irq_stat 0x40000008 ata1.00: failed command: READ FPDMA QUEUED ata1.00: cmd 60/80:18:80:d4:e5/00:00:00:00:00/40 tag 3 ncq 65536 in res 41/40:00:b8:d4:e5/00:00:00:00:00/40 Emask 0x409 (media error) <F> ata1.00: status: { DRDY ERR } ata1.00: error: { UNC } ata1.00: configured for UDMA/133 sd 0:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 0:0:0:0: [sda] tag#3 Sense Key : Medium Error [current] [descriptor] sd 0:0:0:0: [sda] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed sd 0:0:0:0: [sda] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 e5 d4 80 00 00 00 80 00 00 blk_update_request: I/O error, dev sda, sector 15062200 ata1: EH complete Why is it that only sda1 is marked as faulty when both sda and sdb had unrecovered read errors earlier? Does md consider only write failures real failures? How does the logic work? Also note, there are three scrubs in the dmesg ("md: data-check of RAID array md0"). The first three encountered read errors, but nevertheless finished with "md: md0: data-check done.". Only the last scrub that had a write error resulted in md considering the scrub a failure. You can find the full dmesg at https://gist.github.com/nh2/db886f3afbbb4b186aa5088ca2782c06. This left me in the inconvenient situation where I have 2 devices in a RAID1 which have apparent errors, but mdadm emailed me only 91 days (judging from dmesg timestamps) after the first read failure occurred. Any insights would be appreciated. Niklas